Summary: | [xen] [panic] FreeBSD 10 XENHVM panic under NetBSD Dom0 (xn_txeof: WARNING: response is -1) | ||
---|---|---|---|
Product: | Base System | Reporter: | miguelmclara |
Component: | kern | Assignee: | freebsd-xen (Nobody) <xen> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | hugo |
Priority: | Normal | ||
Version: | Unspecified | ||
Hardware: | Any | ||
OS: | Any |
Description
miguelmclara
2014-04-08 01:00:00 UTC
NOTE: disabling tso (ifconfig xn0 -tso) stops the panic, ssh hangs and xn_txeof warning still shows! Tried enable/disable other feaures (LRO, tx/rx csum) but I see no changes! There's another report about this here: http://www.freebsd.org/cgi/query-pr.cgi?pr=182903 I was browsing trough my reports on the xen list and notice I even refereed to it at the time. Should this be close has a duplicate? Also I've been reporting this and testing since at least June 2013 (See http://thr3ads.net/xen-users/2013/06/2653989-Re-FreeBSD-PVHVM-call-for-testing) long before FreeBSD-10 was released. And the PR I mention is from Oct 2013. Both reports are marked has non-critical and of low-priority. Should I assume FreeBSD is not to be supported in NetBSD DOM0's anymore? Please don't take this the wrong way, I understand not all things can be done, just want to know If I should forget about this and keep with FreeBSD-9 or migrate the Dom0 to Linux! Thanks State Changed From-To: open->open Over to maintainer(s). Responsible Changed From-To: freebsd-bugs->freebsd-xen NOTE: NetBSD just announced support for Xen 4.5. I've tested with FreeBSD 9.3 which works fine but anything 10.0+ fails with this error! And since FreeBSD10 comes with XENHVM precompiled/loaded any freebsd10+ guest starts in a none working state! :( And since FreeBSD10 comes with XENHVM precompiled/loaded any freebsd10+ guest starts in a none working state! :( I should clarify that statement by saying: It start in a non working sate in regards to network. tks Dunno if this will help in anyway... I lack C skills and I certainly don't know much about the Xen code... In anycase I will post my findings: I decided to compare the changes on 9 and 10, because 9 is working for me. I noticed that warning comes from netfront.c: if (txr->status != NETIF_RSP_OKAY) { printf("%s: WARNING: response is %d!\n", __func__, txr->status); } So I decided to try and find out whats "NETIF_RSP_OKAY", which apears to "come from" sys/xen/interface/io/netif.h This is the diff from 9 to 10 (stable) src, could any of this changes be the cause!? % diff -u sys/xen/interface/io/netif.h /tmp/netif.h --- sys/xen/interface/io/netif.h 2014-04-11 01:41:58.000000000 +0100 +++ /tmp/netif.h 2015-01-31 06:42:58.000000000 +0000 @@ -41,7 +41,7 @@ /* * This is the 'wire' format for packets: * Request 1: netif_tx_request -- NETTXF_* (any flags) - * [Request 2: netif_tx_extra] (only if request 1 has NETTXF_extra_info) + * [Request 2: netif_tx_extra] (only if request 1 has NETTXF_extra_info) * [Request 3: netif_tx_extra] (only if request 2 has XEN_NETIF_EXTRA_FLAG_MORE) * Request 4: netif_tx_request -- NETTXF_more_data * Request 5: netif_tx_request -- NETTXF_more_data @@ -173,6 +173,10 @@ #define _NETRXF_extra_info (3) #define NETRXF_extra_info (1U<<_NETRXF_extra_info) +/* GSO Prefix descriptor. */ +#define _NETRXF_gso_prefix (4) +#define NETRXF_gso_prefix (1U<<_NETRXF_gso_prefix) + struct netif_rx_response { uint16_t id; uint16_t offset; /* Offset in page of start of received packet */ PARTICULARY THIS: +/* GSO Prefix descriptor. */ +#define _NETRXF_gso_prefix (4) +#define NETRXF_gso_prefix (1U<<_NETRXF_gso_prefix) + GSO is not supported by the netbsd backend I had the same issue with Windows GPLPV drivers a while back (see: http://mail-index.netbsd.org/port-xen/2013/12/12/msg008172.html) One thing to note is that the panic is not related to the "mbuf already on the free list" anymore: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x3 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80800c65 stack pointer = 0x28:0xfffffe00002b29d0 frame pointer = 0x28:0xfffffe00002b2a20 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq771: xn0) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff8095b050 at kdb_backtrace+0x60 #1 0xffffffff8091f745 at panic+0x155 #2 0xffffffff80d1edff at trap_fatal+0x38f #3 0xffffffff80d1f118 at trap_pfault+0x308 #4 0xffffffff80d1e77a at trap+0x47a #5 0xffffffff80d04492 at calltrap+0x8 #6 0xffffffff808027e4 at xn_intr+0x74 #7 0xffffffff808f255b at intr_event_execute_handlers+0xab #8 0xffffffff808f29a6 at ithread_loop+0x96 #9 0xffffffff808f017a at fork_exit+0x9a #10 0xffffffff80d049ce at fork_trampoline+0xe As requested in freebsd-xen mailing list: this is from a core dumb after reboot (saved to /var/crash) KDB: stack backtrace: #0 0xffffffff8095b070 at kdb_backtrace+0x60 #1 0xffffffff8091f765 at panic+0x155 #2 0xffffffff80d1ee1f at trap_fatal+0x38f #3 0xffffffff80d1f138 at trap_pfault+0x308 #4 0xffffffff80d1e79a at trap+0x47a #5 0xffffffff80d044b2 at calltrap+0x8 #6 0xffffffff80802804 at xn_intr+0x74 #7 0xffffffff808f257b at intr_event_execute_handlers+0xab #8 0xffffffff808f29c6 at ithread_loop+0x96 #9 0xffffffff808f019a at fork_exit+0x9a #10 0xffffffff80d049ee at fork_trampoline+0xe Uptime: 54s Dumping 77 out of 727 MB:..21%..42%..63%..83% #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) i li *xn_intr+0x74 Line 1209 of "/usr/src/sys/dev/xen/netfront/netfront.c" starts at address 0xffffffff808027fc <xn_intr+108> and ends at 0xffffffff80802809 <xn_intr+121>. Current language: auto; currently minimal If relevant bt: (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff8091f3e2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0xffffffff8091f7a4 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80d1ee1f in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:865 #4 0xffffffff80d1f138 in trap_pfault (frame=0xfffffe00002b2920, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:676 #5 0xffffffff80d1e79a in trap (frame=0xfffffe00002b2920) at /usr/src/sys/amd64/amd64/trap.c:440 #6 0xffffffff80d044b2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #7 0xffffffff80800c82 in xn_txeof (np=0xfffffe00009e4000) at /usr/src/sys/dev/xen/netfront/netfront.c:1137 #8 0xffffffff80802804 in xn_intr (xsc=0xfffffe00009e4000) at /usr/src/sys/dev/xen/netfront/netfront.c:1209 #9 0xffffffff808f257b in intr_event_execute_handlers ( p=<value optimized out>, ie=0xfffff800025db700) at /usr/src/sys/kern/kern_intr.c:1264 #10 0xffffffff808f29c6 in ithread_loop (arg=0xfffff800025d5520) at /usr/src/sys/kern/kern_intr.c:1277 #11 0xffffffff808f019a in fork_exit ( ---Type <return> to continue, or q <return> to quit--- callout=0xffffffff808f2930 <ithread_loop>, arg=0xfffff800025d5520, frame=0xfffffe00002b2c00) at /usr/src/sys/kern/kern_fork.c:996 #12 0xffffffff80d049ee in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #13 0x0000000000000000 in ?? () Is the kgbd info helpful? Any extra info I can add? UPDATE: I've been trying to bisect with git, so far I tried only to bisect the changes here: https://github.com/freebsd/freebsd/commits/stable/10/sys/xen/xen_intr.h As they seem to affect netback/front and xn_intr All three commit after and including the following are "bad": Improve the Xen para-virtualized device infrastructure of FreeBSD: … gibbs authored on Oct 19, 2010 831bbfa The one before is the same "version" runnning on stable/9 it seems. also disabling txcsum stops the panic when I try ssh-in but I stil see the "xn_txeof: WARNING: response is -1" message and it eventually times out. Fixed by https://svnweb.freebsd.org/base?view=revision&revision=299542 should be MFC to stable/10 soon so closing this :) *** Bug 182884 has been marked as a duplicate of this bug. *** |