Bug 188369

Summary: [xen] [panic] FreeBSD 10 XENHVM panic under NetBSD Dom0 (xn_txeof: WARNING: response is -1)
Product: Base System Reporter: miguelmclara
Component: kernAssignee: freebsd-xen (Nobody) <xen>
Status: Closed FIXED    
Severity: Affects Many People CC: hugo
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description miguelmclara 2014-04-08 01:00:00 UTC
Hi,

I've reported this to xen-devel, but it seems not much people use NetBSD as
a Dom0 and I was wondering if anyone is?

I'm getting a panic and an error message (xn_txeof: WARNING: response is
-1) when I try to use ssh (the error also show for other network operation
but not the panic)

Here goes the panic I get after I try ssh:

xn_txeof: WARNING: response is -1!
panic: mbuf already on the free list, but we're trying to free it again!
cpuid = 0
KDB: enter: panic
[ thread pid 12 tid 100026 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why
db> bt
Tracing pid 12 tid 100026 td 0xfffffe000be79920
kdb_enter() at kdb_enter+0x3e/frame 0xffffff80002fe8c0
vpanic() at vpanic+0x146/frame 0xffffff80002fe900
kassert_panic() at kassert_panic+0x136/frame 0xffffff80002fe970
xn_txeof() at xn_txeof+0x99/frame 0xffffff80002fe9c0
xn_intr() at xn_intr+0x59/frame 0xffffff80002feab0
evtchn_interrupt() at evtchn_interrupt+0x1e6/frame 0xffffff80002feb30
intr_event_execute_handlers() at intr_event_execute_handlers+0x90/frame
0xffffff80002feb70
ithread_loop() at ithread_loop+0x148/frame 0xffffff80002febb0
fork_exit() at fork_exit+0x84/frame 0xffffff80002febf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffff80002febf0
--- trap 0, rip = 0, rsp = 0xffffff80002fecb0, rbp = 0 ---


I've tried a few revisions and actually revert way back to the last one
that was working on 9.x which in stable/10 would be Revision 251297 but no
luck.

I'm using NetBSD 6.1.2 btw

How-To-Repeat: Install Freebsd 10 on a netbsd DOMU, enable ssh try to access remotely, machine panincs
Comment 1 miguelmclara 2014-04-11 00:42:20 UTC
NOTE: disabling tso (ifconfig xn0 -tso) stops the panic, ssh hangs and
xn_txeof warning still shows!
Tried enable/disable other feaures (LRO, tx/rx csum) but I see no changes!
Comment 2 miguelmclara 2014-04-14 01:22:34 UTC
There's another report about this here:

http://www.freebsd.org/cgi/query-pr.cgi?pr=182903

I was browsing trough my reports on the xen list and notice I even refereed
to it at the time.
Should this be close has a duplicate?

Also I've been reporting this and testing since at least June 2013 (See
http://thr3ads.net/xen-users/2013/06/2653989-Re-FreeBSD-PVHVM-call-for-testing)
long before FreeBSD-10 was released.

And the PR I mention is from Oct 2013.

Both reports are marked has non-critical and of low-priority.

Should I assume FreeBSD is not to be supported in NetBSD DOM0's anymore?

Please don't take this the wrong way, I understand not all things can be
done, just want to know If I should forget about this and keep with
FreeBSD-9 or migrate the Dom0 to Linux!

Thanks
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2014-04-20 02:48:45 UTC
State Changed
From-To: open->open

Over to maintainer(s). 


Comment 4 Mark Linimon freebsd_committer freebsd_triage 2014-04-20 02:48:45 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-xen
Comment 5 miguelmclara 2015-01-28 04:22:28 UTC
NOTE: NetBSD just announced support for Xen 4.5.

I've tested with FreeBSD 9.3 which works fine but anything 10.0+ fails with this error!

And since FreeBSD10 comes with XENHVM precompiled/loaded any freebsd10+ guest starts in a none working state! :(
Comment 6 miguelmclara 2015-01-28 04:27:04 UTC
And since FreeBSD10 comes with XENHVM precompiled/loaded any freebsd10+ guest starts in a none working state! :(

I should clarify that statement by saying:
It start in a non working sate in regards to network.

tks
Comment 7 miguelmclara 2015-01-31 06:50:44 UTC
Dunno if this will help in anyway... I lack C skills and I certainly don't know much about the Xen code...


In anycase I will post my findings:

I decided to compare the changes on 9 and 10, because 9 is working for me.

I noticed that warning comes from netfront.c:
                        if (txr->status != NETIF_RSP_OKAY) {
                                printf("%s: WARNING: response is %d!\n",
                                       __func__, txr->status);
                        }

So I decided to try and find out whats "NETIF_RSP_OKAY", which apears to "come from" sys/xen/interface/io/netif.h

This is the diff from 9 to 10 (stable) src, could any of this changes be the cause!?


% diff -u sys/xen/interface/io/netif.h /tmp/netif.h 
--- sys/xen/interface/io/netif.h        2014-04-11 01:41:58.000000000 +0100
+++ /tmp/netif.h        2015-01-31 06:42:58.000000000 +0000
@@ -41,7 +41,7 @@
 /*
  * This is the 'wire' format for packets:
  *  Request 1: netif_tx_request -- NETTXF_* (any flags)
- * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
+ * [Request 2: netif_tx_extra] (only if request 1 has NETTXF_extra_info)
  * [Request 3: netif_tx_extra] (only if request 2 has XEN_NETIF_EXTRA_FLAG_MORE)
  *  Request 4: netif_tx_request -- NETTXF_more_data
  *  Request 5: netif_tx_request -- NETTXF_more_data
@@ -173,6 +173,10 @@
 #define _NETRXF_extra_info     (3)
 #define  NETRXF_extra_info     (1U<<_NETRXF_extra_info)
 
+/* GSO Prefix descriptor. */
+#define _NETRXF_gso_prefix     (4)
+#define  NETRXF_gso_prefix     (1U<<_NETRXF_gso_prefix)
+
 struct netif_rx_response {
     uint16_t id;
     uint16_t offset;       /* Offset in page of start of received packet  */
Comment 8 miguelmclara 2015-01-31 06:52:35 UTC
PARTICULARY THIS:

+/* GSO Prefix descriptor. */
+#define _NETRXF_gso_prefix     (4)
+#define  NETRXF_gso_prefix     (1U<<_NETRXF_gso_prefix)
+

GSO is not supported by the netbsd backend I had the same issue with Windows GPLPV drivers a while back (see: http://mail-index.netbsd.org/port-xen/2013/12/12/msg008172.html)
Comment 9 miguelmclara 2015-01-31 14:44:11 UTC
One thing to note is that the panic is not related to the "mbuf already on the free list" anymore:



Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80800c65
stack pointer           = 0x28:0xfffffe00002b29d0
frame pointer           = 0x28:0xfffffe00002b2a20
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq771: xn0)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8095b050 at kdb_backtrace+0x60
#1 0xffffffff8091f745 at panic+0x155
#2 0xffffffff80d1edff at trap_fatal+0x38f
#3 0xffffffff80d1f118 at trap_pfault+0x308
#4 0xffffffff80d1e77a at trap+0x47a
#5 0xffffffff80d04492 at calltrap+0x8
#6 0xffffffff808027e4 at xn_intr+0x74
#7 0xffffffff808f255b at intr_event_execute_handlers+0xab
#8 0xffffffff808f29a6 at ithread_loop+0x96
#9 0xffffffff808f017a at fork_exit+0x9a
#10 0xffffffff80d049ce at fork_trampoline+0xe
Comment 10 miguelmclara 2015-02-01 17:25:33 UTC
As requested in freebsd-xen mailing list:

this is from a core dumb after reboot (saved to /var/crash)

KDB: stack backtrace:
#0 0xffffffff8095b070 at kdb_backtrace+0x60
#1 0xffffffff8091f765 at panic+0x155
#2 0xffffffff80d1ee1f at trap_fatal+0x38f
#3 0xffffffff80d1f138 at trap_pfault+0x308
#4 0xffffffff80d1e79a at trap+0x47a
#5 0xffffffff80d044b2 at calltrap+0x8
#6 0xffffffff80802804 at xn_intr+0x74
#7 0xffffffff808f257b at intr_event_execute_handlers+0xab
#8 0xffffffff808f29c6 at ithread_loop+0x96
#9 0xffffffff808f019a at fork_exit+0x9a
#10 0xffffffff80d049ee at fork_trampoline+0xe
Uptime: 54s
Dumping 77 out of 727 MB:..21%..42%..63%..83%

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) i li *xn_intr+0x74
Line 1209 of "/usr/src/sys/dev/xen/netfront/netfront.c"
   starts at address 0xffffffff808027fc <xn_intr+108>
   and ends at 0xffffffff80802809 <xn_intr+121>.
Current language:  auto; currently minimal


If relevant bt:
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff8091f3e2 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:452
#2  0xffffffff8091f7a4 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80d1ee1f in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:865
#4  0xffffffff80d1f138 in trap_pfault (frame=0xfffffe00002b2920, 
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:676
#5  0xffffffff80d1e79a in trap (frame=0xfffffe00002b2920)
    at /usr/src/sys/amd64/amd64/trap.c:440
#6  0xffffffff80d044b2 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#7  0xffffffff80800c82 in xn_txeof (np=0xfffffe00009e4000)
    at /usr/src/sys/dev/xen/netfront/netfront.c:1137
#8  0xffffffff80802804 in xn_intr (xsc=0xfffffe00009e4000)
    at /usr/src/sys/dev/xen/netfront/netfront.c:1209
#9  0xffffffff808f257b in intr_event_execute_handlers (
    p=<value optimized out>, ie=0xfffff800025db700)
    at /usr/src/sys/kern/kern_intr.c:1264
#10 0xffffffff808f29c6 in ithread_loop (arg=0xfffff800025d5520)
    at /usr/src/sys/kern/kern_intr.c:1277
#11 0xffffffff808f019a in fork_exit (
---Type <return> to continue, or q <return> to quit---
    callout=0xffffffff808f2930 <ithread_loop>, arg=0xfffff800025d5520, 
    frame=0xfffffe00002b2c00) at /usr/src/sys/kern/kern_fork.c:996
#12 0xffffffff80d049ee in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#13 0x0000000000000000 in ?? ()
Comment 11 miguelmclara 2015-02-09 23:07:22 UTC
Is the kgbd info helpful? Any extra info I can add?
Comment 12 miguelmclara 2015-02-15 22:44:01 UTC
UPDATE: I've been trying to bisect with git, so far I tried only to bisect the changes here:

https://github.com/freebsd/freebsd/commits/stable/10/sys/xen/xen_intr.h

As they seem to affect netback/front and xn_intr


All three commit after and including the following are "bad":
Improve the Xen para-virtualized device infrastructure of FreeBSD: …
gibbs authored on Oct 19, 2010
831bbfa  


The one before is the same "version" runnning on stable/9 it seems.


also disabling txcsum stops the panic when I try ssh-in but I stil see the "xn_txeof: WARNING: response is -1" message and it eventually times out.
Comment 13 miguelmclara 2016-05-13 02:02:01 UTC
Fixed by https://svnweb.freebsd.org/base?view=revision&revision=299542

should be MFC to stable/10 soon so closing this :)
Comment 14 Roger Pau Monné freebsd_committer freebsd_triage 2016-05-20 09:04:45 UTC
*** Bug 182884 has been marked as a duplicate of this bug. ***