Bug 113430 - Kernel Panic with emulators/qemu on AMD64 SMP
Summary: Kernel Panic with emulators/qemu on AMD64 SMP
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Juergen Lock
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-06 21:40 UTC by Allan Jude
Modified: 2008-05-12 20:10 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Allan Jude 2007-06-06 21:40:04 UTC
When i use ports/emulators/qemu with ports/emulators/kqemu-kmod under an AMD64 SMP kernel, it causes a panic (trap 12, supervisor read, page not present)

it works fine under i386 SMP, or AMD64 UP

i have tried qemu/kqemu binaries from pkg_add and compiled from source, as well as  the latest qemu-devel

How-To-Repeat: start qemu-system-x86_64 with kqemu (either -kernel-kqemu or without -no-kqemu)

system will panic immediately.
Comment 1 Allan Jude 2007-06-06 21:56:56 UTC
I tested the i386 SMP and AMD64 UP on the exact same system, not
different systems. And i also tried AMD64 SMP with machdep.hlt_cpus=2
(to halt the second cpu, and leave just the first running) and it still
crashed.
Comment 2 Edwin Groothuis freebsd_committer freebsd_triage 2007-06-06 22:55:25 UTC
Responsible Changed
From-To: freebsd-ports-bugs->nox

Over to maintainer
Comment 3 Juergen Lock freebsd_committer freebsd_triage 2007-06-06 23:41:29 UTC
State Changed
From-To: open->feedback

Hmm a backtrace may be useful (this may be a little tricky since kqemu 
is a kld, maybe you can use the scripts in src/tools/debugscripts, or, 
failing that, use the KDB_TRACE kernel option.)
Comment 4 Allan Jude 2007-06-07 20:03:09 UTC
i just ran another test with a debug kernel (GENERIC SMP plus KDB,
KDB_TRACE, DDB, GDB)

got another kernel panic, trap 12, instruction pointer was:
0xffffffff804383f2

nm -n /boot/debug/kernel | grep ffffffff804383
gives:
======================================================================
ffffffff80438300 T taskqueue_create_fast
ffffffff80438320 T taskqueue_enqueue_fast
ffffffff80438330 t taskqueue_fast_enqueue
ffffffff80438350 t taskqueue_fast_run
ffffffff80438370 t taskqueue_define_fast
ffffffff804383d0 T userret
======================================================================


did a kgdb on the vmcore that was generated:
======================================================================

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x202
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xffffffff804383f2
stack pointer           = 0x10:0xffffffffb38f5ba0
frame pointer           = 0x10:0xffffffffb38f5d10
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 818 (qemu-system-x86_64)
panic: from debugger
cpuid = 0
KDB: stack backtrace:
Uptime: 3m48s
Dumping 4078 MB (3 chunks)
  chunk 0: 1MB (155 pages) ... ok
  chunk 1: 3310MB (847280 pages) 3294 3278 3262 3246 3230 3214 3198 3182
3166 3150 3134 3118 3102 3086 3070 3054 3038 3022 3006 2990 2974 2958
2942 2926 2910 2894 2878 2862 2846 2830 2814 2798 2782 2766 2750 2734
2718 2702 2686 2670 2654 2638 2622 2606 2590 2574 2558 2542 2526 2510
2494 2478 2462 2446 2430 2414 2398 2382 2366 2350 2334 2318 2302 2286
2270 2254 2238 2222 2206 2190 2174 2158 2142 2126 2110 2094 2078 2062
2046 2030 2014 1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838
1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614
1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390
1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166
1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926
910 894 878 862 846 830 814 798 782 766 750 734 718 702 686 670 654 638
622 606 590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350
334 318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46
30 14 ... ok
  chunk 2: 768MB (196608 pages) 753 737 721 705 689 673 657 641 625 609
593 577 561 545 529 513 497 481 465 449 433 417 401 385 369 353 337 321
305 289 273 257 241 225 209 193 177 161 145 129 113 97 81 65 49 33 17 1

#0  doadump () at pcpu.h:172
172             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) getsyms
During symbol reading, Incomplete CFI data; unspecified registers at
0xffffffff8040e0dc.
Id Refs Address    Size     Name
 1    7 0x80100000 9bbec8   kernel
 2    4 0xb6624000 8472     netgraph.ko
 3    1 0xb662d000 12fd     ng_ether.ko
 4    1 0xb662f000 2da9     ng_pppoe.ko
 5    1 0xb6632000 1bad     ng_socket.ko
 6    1 0xb6634000 4a07     aio.ko
 7    1 0xb6639000 276da    kqemu.ko
Select the list above with the mouse, paste into the screen
and then press ^D.  Yes, this is annoying.
 1    7 0x80100000 9bbec8   kernel
 2    4 0xb6624000 8472     netgraph.ko
 3    1 0xb662d000 12fd     ng_ether.ko
 4    1 0xb662f000 2da9     ng_pppoe.ko
 5    1 0xb6632000 1bad     ng_socket.ko
 6    1 0xb6634000 4a07     aio.ko
 7    1 0xb6639000 276da    kqemu.ko
add symbol table from file
"/usr/obj/usr/src/sys/DEBUG/modules/usr/src/sys/modules/aio/aio.ko.debug" at
        .text_addr = 0xb6634000
        .data_addr = 0xb6634000
        .bss_addr = 0xb6634000
add symbol table from file
"/usr/obj/usr/src/sys/DEBUG/modules/usr/src/sys/modules/netgraph/ether/ng_ether.ko.debug"
at
        .text_addr = 0xb662d000
        .data_addr = 0xb662d000
        .bss_addr = 0xb662d000
add symbol table from file
"/usr/obj/usr/src/sys/DEBUG/modules/usr/src/sys/modules/netgraph/netgraph/netgraph.ko.debug"
at
        .text_addr = 0xb6624000
        .data_addr = 0xb6624000
        .bss_addr = 0xb6624000
add symbol table from file
"/usr/obj/usr/src/sys/DEBUG/modules/usr/src/sys/modules/netgraph/pppoe/ng_pppoe.ko.debug"
at
        .text_addr = 0xb662f000
        .data_addr = 0xb662f000
        .bss_addr = 0xb662f000
add symbol table from file
"/usr/obj/usr/src/sys/DEBUG/modules/usr/src/sys/modules/netgraph/socket/ng_socket.ko.debug"
at
        .text_addr = 0xb6632000
        .data_addr = 0xb6632000
        .bss_addr = 0xb6632000
(kgdb) where
#0  doadump () at pcpu.h:172
During symbol reading, Incomplete CFI data; unspecified registers at
0xffffffff8040e0dc.
#1  0xffffffff8040e735 in boot (howto=0x104) at
/usr/src/sys/kern/kern_shutdown.c:409
#2  0xffffffff8040ee45 in panic (fmt=0xffffff00c3965000 "°6\211¿") at
/usr/src/sys/kern/kern_shutdown.c:565
#3  0xffffffff801b0312 in db_panic (addr=0x0, have_addr=0x0, count=0x0,
modif=0x0) at /usr/src/sys/ddb/db_command.c:438
#4  0xffffffff801b0855 in db_command_loop () at
/usr/src/sys/ddb/db_command.c:350
#5  0xffffffff801b277d in db_trap (type=0xb38f5930, code=0x0) at
/usr/src/sys/ddb/db_main.c:222
#6  0xffffffff8042e329 in kdb_trap (type=0xc, code=0x0,
tf=0xffffffffb38f5af0) at /usr/src/sys/kern/subr_kdb.c:473
#7  0xffffffff80650975 in trap_fatal (frame=0xffffffffb38f5af0,
eva=0xffffff00c3965000)
    at /usr/src/sys/amd64/amd64/trap.c:651
#8  0xffffffff80650d03 in trap_pfault (frame=0xffffffffb38f5af0,
usermode=0x0) at /usr/src/sys/amd64/amd64/trap.c:573
#9  0xffffffff80650f5d in trap (frame=
      {tf_rdi = 0xffffff012f655720, tf_rsi = 0x4, tf_rdx = 0x46, tf_rcx
= 0xffffffff8063c05b, tf_r8 = 0xffffffff8094e768, tf_r9 =
0xffffff012f655720, tf_rax = 0x2, tf_rbx = 0xf4240, tf_rbp =
0xffffffffb38f5d10, tf_r10 = 0xffffff012b39e108, tf_r11 = 0x2, tf_r12 =
0xffffff012f655720, tf_r13 = 0xffffffffb38f5bd0, tf_r14 = 0x0, tf_r15 =
0xffffffff801c4cd0, tf_trapno = 0x4, tf_addr = 0x2, tf_flags =
0xfffffffd, tf_err = 0x0, tf_rip = 0xffffffff804383f2, tf_cs = 0x8,
tf_rflags = 0x10282, tf_rsp = 0xffffffffb38f5bb0, tf_ss =
0xffffffff806468c8}) at /usr/src/sys/amd64/amd64/trap.c:352
#10 0xffffffff80640eca in lapic_handle_timer (frame=
      {cf_rdi = 0xffffffff8094e768, cf_rsi = 0xffffff012f655720, cf_rdx
= 0x2, cf_rcx = 0xf4240, cf_r8 = 0xffffffffb38f5d10, cf_r9 =
0xffffff012b39e108, cf_rax = 0x2, cf_rbx = 0xffffff012f655720, cf_rbp =
0xffffffffb38f5bd0, cf_r10 = 0x0, cf_r11 = 0xffffffff801c4cd0, cf_r12 =
0x4, cf_r13 = 0x2, cf_r14 = 0xfffffffd, cf_r15 = 0x0, cf_rip =
0xffffffff806468c8, cf_cs = 0x8, cf_rflags = 0x202, cf_rsp =
0xffffffffb38f5bd0, cf_ss = 0x10})
    at /usr/src/sys/amd64/amd64/local_apic.c:657
#11 0xffffffff8063c05b in Xcpustop () at apic_vector.S:282
#12 0xffffffff806468c8 in mp_grab_cpu_hlt () at
/usr/src/sys/amd64/amd64/mp_machdep.c:1226
#13 0x000000000000000c in __set_modmetadata_set_sym__mod_metadata_md_aio ()
Cannot access memory at address 0x8
(kgdb) quit
======================================================================
Comment 5 Juergen Lock 2007-06-07 23:02:47 UTC
On Thu, Jun 07, 2007 at 07:10:14PM +0000, Allan Jude wrote:
>[...]
>  got another kernel panic, trap 12, instruction pointer was:
>  0xffffffff804383f2

Hmm can you do an `i li *0xffffffff804383f2' in kgdb?
Comment 6 dukemaster 2007-06-08 15:44:20 UTC
Line 82 of "/usr/src/sys/kern/subr_trap.c" starts at address
0xffffffff804383f2 <userret+34>
   and ends at 0xffffffff804383f5 <userret+37>.



before:

Line 81 of "/usr/src/sys/kern/subr_trap.c" starts at address
0xffffffff804383d0 <userret>
   and ends at 0xffffffff804383f2 <userret+34>.


after:

Line 81 of "/usr/src/sys/kern/subr_trap.c" starts at address
0xffffffff804383f5 <userret+37>
   and ends at 0xffffffff804383f8 <userret+40>.
Comment 7 Allan Jude 2007-06-08 16:03:16 UTC
I recreated it again, and the 'stopped at' in the kernel panic is:

userret+0x22	movq	0(%rdi),%rbx
Comment 8 Juergen Lock 2007-06-08 21:24:50 UTC
On Fri, Jun 08, 2007 at 03:10:10PM +0000, Allan Jude wrote:
>  I recreated it again, and the 'stopped at' in the kernel panic is:
>  
>  userret+0x22	movq	0(%rdi),%rbx

Ok so apparently userret was called with a bogus td arg, can you find
out from where?  (there should be a return address on the stack, userret
here starts with a sub $0x28,%rsp (hmm, no frame pointer?) so add that or
whatever yours subtracts.)

 Btw,

>  fault virtual address   = 0x202
>  fault code              = supervisor read, page not present
>[...]
>  #9  0xffffffff80650f5d in trap (frame=
>        {tf_rdi = 0xffffff012f655720, tf_rsi = 0x4, tf_rdx = 0x46, tf_rcx
>[...]

 shouldnt tf_rdi here be rdi at the time of the fault, i.e. 0x202?
Anyone know why its different?  Also, as mentioned above userret doesnt
save a frame pointer here (rbp) and indeed,

>  0xffffff012f655720, tf_rax = 0x2, tf_rbx = 0xf4240, tf_rbp =
>  0xffffffffb38f5d10, tf_r10 = 0xffffff012b39e108, tf_r11 = 0x2, tf_r12 =
>[...]
>  tf_rflags = 0x10282, tf_rsp = 0xffffffffb38f5bb0, tf_ss =

 tf_rbp seems to be way off compared to tf_rsp, are parts of the kernel
now compiled with -fomit-frame-pointer?  (even for a debug kernel?)
This may explain why we dont see who called userret in the kgdb backtrace...
Comment 9 Juergen Lock 2007-07-14 22:38:41 UTC
Can you please check if this is still a problem with the current port?
(It may have been caused by the kld not being compiled with SMP defined.)
Comment 10 Mark Linimon freebsd_committer freebsd_triage 2008-03-03 06:43:06 UTC
State Changed
From-To: feedback->closed

Feedback timeout (> 6 months).
Comment 11 Mark Linimon freebsd_committer freebsd_triage 2008-03-05 05:33:02 UTC
State Changed
From-To: closed->suspended

Assignee notes that the problem really still exists, but is very 
difficult to reproduce.  Reopen.
Comment 12 dfilter service freebsd_committer freebsd_triage 2008-05-01 14:29:23 UTC
nox         2008-05-01 13:29:16 UTC

  FreeBSD ports repository

  Modified files:
    emulators/kqemu-kmod Makefile 
  Added files:
    emulators/kqemu-kmod/files patch-common-Makefile 
                               patch-tssworkaround 
  Log:
  - Add a workaround for the amd64 SMP shared gdt issue that caused the
    host panics - longer explanation in this post:
          http://docs.freebsd.org/cgi/mid.cgi?20080501101951.GA30274 [1]
  - Get rid of superfluous "kqemu " in IGNORE message when kernel source
    is missing
  - Pass down DEBUG_FLAGS to the build
  - Bump PORTREVISION
  
  PR:             ports/113430 [1]
  
  Revision  Changes    Path
  1.23      +4 -2      ports/emulators/kqemu-kmod/Makefile
  1.1       +22 -0     ports/emulators/kqemu-kmod/files/patch-common-Makefile (new)
  1.1       +70 -0     ports/emulators/kqemu-kmod/files/patch-tssworkaround (new)
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 13 Juergen Lock freebsd_committer freebsd_triage 2008-05-01 14:59:31 UTC
State Changed
From-To: suspended->closed

Workaround committed. Thanks!
Comment 14 dfilter service freebsd_committer freebsd_triage 2008-05-12 20:09:57 UTC
nox         2008-05-12 19:09:52 UTC

  FreeBSD ports repository

  Modified files:
    emulators/kqemu-kmod Makefile 
    emulators/kqemu-kmod/files patch-tssworkaround 
  Log:
  - Fix multiple qemu processes on amd64 SMP by actually using seperate
    per-cpu gdts (the previous fix was only stable for one qemu process
    at a time)
    Relevant thread:
          http://lists.freebsd.org/pipermail/freebsd-emulation/2008-May/004902.html
  - Bump PORTREVISION
  
  PR:             ports/113430
  
  Revision  Changes    Path
  1.25      +1 -1      ports/emulators/kqemu-kmod/Makefile
  1.3       +49 -8     ports/emulators/kqemu-kmod/files/patch-tssworkaround
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"