Bug 25958

Summary: Xfree86's savage and vesa drivers can panic the kernel!
Product: Base System Reporter: Nate Dannenberg <natedac>
Component: i386Assignee: msmith
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Nate Dannenberg 2001-03-21 04:40:01 UTC
Using the X 4.0.2 "savage" and "vesa" video drivers causes a kernel panic.  
Using the "vga" driver, however ugly the display is, does not panic the 
kernel.

The last time I tried 4.0.2 before this latest attempt, I also tried the
"i128" driver, which the website says is a Number Nine driver.  It didn't
work either, causing a kernel panic/reboot.  Elsewhere in the website it
clarifies that this driver is intended for Imagine-128 chipsets.

More information will follow as I learn more about X 4.0.2, including 
further bug reports, if they seem necessary.

Fix: 

No idea.  Using the "vga" driver provides a temporary solution, however by 
the nature of the vga driver, no accelleration or high-quality displays 
are available in this mode.
How-To-Repeat: 
You'll probably need an IBM Aptiva and/or a Number Nine SR9 video card 
(which is out of production anyway).  Compile X 4.0.2 from ports, create a 
config file with xf86cfg, and substitute either the "savage" or "vesa" 
driver in place of the "vga" driver in the file.

Try to start X either as root, or with Xwrapper, and it should panic the 
kernel (at least, it causes the system to reboot, you never actually get 
to see the kernel messages as the console does not switch back to where 
the messages are appearing).

You may also be able to crash the machine simply by running xf86cfg 
without the -textmode switch.  In my setup, this will crash the machine.

If you force text mode with the above mentioned switch, xf86cfg does 
create a config file which does work, albeit only in 320x240/8-bit and 
640x480/4-bit modes.
Comment 1 Nate Dannenberg 2001-03-25 03:10:15 UTC
I have come up with additional information regarding this problem, in that
I now know how to get FreeBSD to provide the crash information for me at
startup, via it's crashdump feature (see dumpon(8) and savecore(8)).

The problem was that XFree86 4.0.2 (and 4.0.3) will crash my system, an
IBM Aptiva (550 MHz Athlon, 96M Ram, Number Nine SR9 with S3 Savage4
chipset and 8M ram), if I use the "savage" or "vesa" drivers.  Using the
"vga" driver provides a temporary solution, however it's not a suitable
replacement due to the very nature of plain VGA modes.

----- Text Import Begin -----

su-2.04# gdb -k
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-unknown-freebsd".
(kgdb) symbol-file kernel
Reading symbols from kernel...(no debugging symbols found)...done.
(kgdb) exec-file /usr/crash/kernel.0
(kgdb) core-file /usr/crash/vmcore.0
IdlePTD 3465216
initial pcb at 2ba4a0
panicstr: general protection fault
panic messages:
---
Fatal trap 9: general protection fault while in kernel mode
instruction pointer     = 0x8:0xc024b4a3
stack pointer           = 0x10:0xc76a9cf0
frame pointer           = 0x10:0xc76a9d10
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 3
current process         = 344 (XFree86)
interrupt mask          = none
trap number             = 9
panic: general protection fault
syncing disks... 30 29 21 10
done
Uptime: 11m46s
dumping to dev #ad/0x20001, offset 205184
dump ata0: resetting devices .. done
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71
70 69
 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45
44 43 4
2 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18
17 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
---
#0  0xc013ebba in dumpsys ()
(kgdb) where
#0  0xc013ebba in dumpsys ()
#1  0xc013e9db in boot ()
#2  0xc013ed58 in poweroff_wait ()
#3  0xc02575bd in trap_fatal ()
#4  0xc0256fcb in trap ()
#5  0xc024b4a3 in i686_mrstoreone ()
#6  0xc024b3e9 in i686_mrstore ()
#7  0xc024ba21 in i686_mrset ()
#8  0xc0252a81 in mem_range_attr_set ()
#9  0xc02529fd in mem_ioctl ()
#10 0xc02528b4 in mmioctl ()
#11 0xc017545e in spec_ioctl ()
#12 0xc0175189 in spec_vnoperate ()
#13 0xc0209a3d in ufs_vnoperatespec ()
#14 0xc0171a44 in vn_ioctl ()
#15 0xc014d10e in ioctl ()
#16 0xc0257869 in syscall2 ()
#17 0xc0249b95 in Xint0x80_syscall ()
#18 0x8091059 in ?? ()
#19 0x80955c7 in ?? ()
#20 0x87760dd in ?? ()
#21 0x8758e32 in ?? ()
#22 0x806b7cc in ?? ()
#23 0x80bab12 in ?? ()
#24 0x806b045 in ?? ()
(kgdb) quit

I'll provide more information if I can, as I write this, I have a new
kernel compiling to include within it the debugging symbols and the
debugger itself (my existing copy lacks these items).


-- 
 /~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~___~~~~~\
|  natedac@kscable.com              //Z@|___  |
|  http://home.kscable.com/natedac |'(__ [_<  |
 \_C64/C128_-_What's_*YOUR*_hobby?__\___|____/
Comment 2 Nate Dannenberg 2001-03-27 19:53:25 UTC
I've managed to get some new info about this problem.  I managed to get a
proper debug kernel built and have extracted some information from it
using gdb as per the handbook.  You may want to delete the panic message 
and other information I posted earlier, it's incomplete anyways.

bash-2.04# gdb -k kernel.debug /usr/crash/vmcore.2
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you 
are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
This GDB was configured as "i386-unknown-freebsd"...
IdlePTD 3457024
initial pcb at 2ba4c0
panicstr: general protection fault
panic messages:
---
Fatal trap 9: general protection fault while in kernel mode
instruction pointer     = 0x8:0xc024b4b3
stack pointer           = 0x10:0xc76b4cf0
frame pointer           = 0x10:0xc76b4d10
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 3
current process         = 330 (XFree86)
interrupt mask          = none
trap number             = 9
panic: general protection fault

syncing disks... 33 32 23 12
done
Uptime: 3m14s
dumping to dev #ad/0x20001, offset 205184
dump ata0: resetting devices .. done
95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 
70 69
 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 
44 43 4
2 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 
17 16
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
---
#0  dumpsys () at ../../kern/kern_shutdown.c:469
469             if (dumping++) {
(kgdb) where
#0  dumpsys () at ../../kern/kern_shutdown.c:469
#1  0xc013e9db in boot (howto=256) at ../../kern/kern_shutdown.c:309
#2  0xc013ed58 in poweroff_wait (junk=0xc028e1c5, howto=-957278880)
    at ../../kern/kern_shutdown.c:556
#3  0xc02575cd in trap_fatal (frame=0xc76b4cb0, eva=0)
    at ../../i386/i386/trap.c:951
#4  0xc0256fdb in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16,
      tf_edi = -1, tf_esi = -1, tf_ebp = -949269232, tf_isp = -949269284,
      tf_ebx = -1, tf_edx = -1, tf_ecx = 592, tf_eax = -1, tf_trapno = 9,
      tf_err = 0, tf_eip = -1071336269, tf_cs = 8, tf_eflags = 77974,
      tf_esp = 65535, tf_ss = 0}) at ../../i386/i386/trap.c:613
#5  0xc024b4b3 in i686_mrstoreone (arg=0xc02d7ed4)
    at ../../i386/i386/i686_mem.c:290
#6  0xc024b3f9 in i686_mrstore (sc=0xc02d7ed4)
    at ../../i386/i386/i686_mem.c:253
#7  0xc024ba31 in i686_mrset (sc=0xc02d7ed4, mrd=0xc09d81e0, 
arg=0xc76b4eb0)
    at ../../i386/i386/i686_mem.c:489
#8  0xc0252a91 in mem_range_attr_set (mrd=0xc09d81e0, arg=0xc76b4eb0)
    at ../../i386/i386/mem.c:439
#9  0xc0252a0d in mem_ioctl (dev=0xc02b8ebc, cmd=2148298035,
    data=0xc76b4eac "\bù¿¿", flags=3, p=0xc6f11560)
    at ../../i386/i386/mem.c:402
#10 0xc02528c4 in mmioctl (dev=0xc02b8ebc, cmd=2148298035,
    data=0xc76b4eac "\bù¿¿", flags=3, p=0xc6f11560)
    at ../../i386/i386/mem.c:338
#11 0xc017544e in spec_ioctl (ap=0xc76b4de8)
    at ../../miscfs/specfs/spec_vnops.c:306
#12 0xc0175179 in spec_vnoperate (ap=0xc76b4de8)
    at ../../miscfs/specfs/spec_vnops.c:119
#13 0xc0209a09 in ufs_vnoperatespec (ap=0xc76b4de8)
    at ../../ufs/ufs/ufs_vnops.c:2391
#14 0xc0171a34 in vn_ioctl (fp=0xc0acd740, com=2148298035,
    data=0xc76b4eac "\bù¿¿", p=0xc6f11560) at vnode_if.h:429
#15 0xc014d0fe in ioctl (p=0xc6f11560, uap=0xc76b4f80) at 
../../sys/file.h:178
#16 0xc0257879 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
      tf_edi = 983040, tf_esi = 65536, tf_ebp = -1077937884,
      tf_isp = -949268524, tf_ebx = -1077937912, tf_edx = 673607984,
      tf_ecx = 141856768, tf_eax = 54, tf_trapno = 12, tf_err = 2,
      tf_eip = 673244900, tf_cs = 31, tf_eflags = 12951, tf_esp = 
-1077937976,
      tf_ss = 47}) at ../../i386/i386/trap.c:1150
#17 0xc0249ba5 in Xint0x80_syscall ()
#18 0x8091059 in ?? ()
#19 0x80955c7 in ?? ()
#20 0x872a0dd in ?? ()
#21 0x8710e32 in ?? ()
#22 0x806b7cc in ?? ()
#23 0x80bab12 in ?? ()
#24 0x806b045 in ?? ()
(kgdb) up 4
#4  0xc0256fdb in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = 16,
      tf_edi = -1, tf_esi = -1, tf_ebp = -949269232, tf_isp = -949269284,
      tf_ebx = -1, tf_edx = -1, tf_ecx = 592, tf_eax = -1, tf_trapno = 9,
      tf_err = 0, tf_eip = -1071336269, tf_cs = 8, tf_eflags = 77974,
      tf_esp = 65535, tf_ss = 0}) at ../../i386/i386/trap.c:613
613                     trap_fatal(&frame, eva);
(kgdb) quit
Comment 3 Nate Dannenberg 2001-03-27 23:02:52 UTC
Problem has been solved by a workaround in the kernel source.  Thanks to 
the help of some of the guys on #BSDcode on EFnet (hi BigSpoon!).  
Apparently MTRR doesn't work on this paricular processor, an AMD 550 MHz 
Athlon, or the kernel is somehow detecting it wrong.  XFree86 
4.4.x obviously is trying to use this feature somehow, and that was 
causing a kernel panic.

Temporary Workaround:

At line 572 of /sys/i386/i386/i686_mem.c, comment out the commands that 
attempt to detect the MTRR facility, like this:

---- snip ----

static void
i686_mem_drvinit(void *unused)
   
{

    /* Try for i686 MTRRs */
/*
    if ((cpu_feature & CPUID_MTRR) &&
        ((cpu_id & 0xf00) == 0x600) &&
        ((strcmp(cpu_vendor, "GenuineIntel") == 0) ||
        (strcmp(cpu_vendor, "AuthenticAMD") == 0))) {
        mem_range_softc.mr_op = &i686_mrops;
    }
  
*/
    
}
Comment 4 Nate Dannenberg 2001-03-27 23:14:57 UTC
On Tue, 27 Mar 2001 natedac@kscable.com wrote:

> Athlon, or the kernel is somehow detecting it wrong.  XFree86
> 4.4.x obviously is trying to use this feature somehow, and that was
> causing a kernel panic.

Erm... make that 4.0.x (i.e. 4.0.2, 4.0.3)

-- 
 /~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~___~~~~~\
|  natedac@kscable.com              //Z@|___  |
|  http://home.kscable.com/natedac |'(__ [_<  |
 \_C64/C128_-_What's_*YOUR*_hobby?__\___|____/
Comment 5 Nate Dannenberg 2001-03-28 07:51:49 UTC
Thanks to lots of help from BigSpoon on #BSDcode/EFnet (jhb@FreeBSD.org),
here's a more proper way to fix the MTRR problem that plages some machines
like mine.  This will allow the MTRR support to be disabled at will either
in /boot/loader.conf or at the boot prompt.  It appears that today the
copy of this file sitting on ftp.freebsd.org has been changed slightly (so
a patch/diff against my week-old copy would be of little use anyways ;-)

Anyways, here we go:

Edit /usr/src/sys/i386/i386/i686_mem.c

Somewhere near the start of the file, add these two lines:

------------------------------------------------------------------------------
   static int mtrrs_disabled;
   TUNABLE_INT_DECL("machdep.mtrrs_disabled", 0, mtrrs_disabled);
------------------------------------------------------------------------------

Then, go to the end of the file, and make these changes to 
i686_mem_drvinit()...

------------------------------------------------------------------------------
   static void
   i686_mem_drvinit(void *unused)

   {

   /* First, check if the user wants to allow MTRR at all */

   if (mtrrs_disabled) { return; }

       /* Ok, MTRR is allowed, so try for i686 MTRRs */

       if ((cpu_feature & CPUID_MTRR) &&
          ((cpu_id & 0xf00) == 0x600) &&
          ((strcmp(cpu_vendor, "GenuineIntel") == 0) ||
          (strcmp(cpu_vendor, "AuthenticAMD") == 0))) {
          mem_range_softc.mr_op = &i686_mrops;
       }
   }
-----------------------------------------------------------------------------

Finally, add a line to /boot/loader.conf to disable the MTRR support if 
you need to (leaving it out or mis-spelling the variable name will cause 
MTRR support to be detected and enabled normally):

machdep.mtrrs_disabled = 1

I guess at this point, the original PR should remain open until the bug is
solved in a more formal manner, but this seems to be an effective
workaround.  So far, I haven't seen any ill effects from disabling MTRR's.
Comment 6 Mario Sergio Fujikawa Ferreira freebsd_committer freebsd_triage 2001-04-11 03:34:46 UTC
Responsible Changed
From-To: freebsd-ports->jmz

Over to maintainer.
Comment 7 John Baldwin freebsd_committer freebsd_triage 2001-08-02 22:21:40 UTC
Responsible Changed
From-To: jmz->jhb

This is more about needing to disable MTRR's on buggy processors which is a 
kernel issue, not a ports issue.
Comment 8 John Baldwin freebsd_committer freebsd_triage 2001-11-16 23:29:23 UTC
Responsible Changed
From-To: jhb->msmith

Mike wants it so he can fix the missing holes in our K7 MTRR support.
Comment 9 dwmalone freebsd_committer freebsd_triage 2002-04-29 09:26:44 UTC
State Changed
From-To: open->closed

I believe the patches that I recently committed to -current and -stable 
should resolve this issue.