Bug 249123 - PVH domU not migrating from one XEN host to another
Summary: PVH domU not migrating from one XEN host to another
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-xen (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-05 07:48 UTC by Pierre-Philipp Braun
Modified: 2020-09-07 09:46 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pierre-Philipp Braun 2020-09-05 07:48:27 UTC
XEN farm is 4.14 release
Dom0s are Slackware current (Aug 2020) and downgraded Linux kernel 4.18.20

I've tried with FreeBSD 13.0-CURRENT but I suppose that would reproduce with latest stable release also.  PVH domU is not migrating from one XEN host to another.  I didn't try PV, since there is PVH now.

pro5s1# xl migrate freebsd pro5s2
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x3/0x0/2176)
Loading new save file <incoming migration stream> (new xl fmt info 0x3/0x0/2176)
 Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Saving domain 41, type x86 HVM
xc: info: Found x86 HVM domain from Xen 4.14
xc: info: Restoring domain
libxl: error: libxl_dom_suspend.c:362:suspend_common_wait_guest_timeout: Domain 41:guest did not suspend,
timed out
xc: error: save callback suspend() failed: 0: Internal error
xc: error: Save failed (0 = Success): Internal error
libxl: error: libxl_stream_write.c:347:libxl__xc_domain_save_done: Domain 41:saving domain: domain responded to suspend request: Success
migration sender: libxl_domain_suspend failed (rc=-8)
xc: error: Failed to read Record Header from stream (0 = Success): Internal error
xc: error: Restore failed (0 = Success): Internal error
libxl: error: libxl_stream_read.c:850:libxl__xc_domain_restore_done: restoring domain: Success
libxl: error: libxl_create.c:1576:domcreate_rebuild_done: Domain 12:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 12:Non-existant domain
libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 12:Unable to destroy guest
libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 12:Destruction of domain failed
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:117:libxl_report_child_exitstatus: migration transport process [19981] exited with error status 1
Migration failed, failed to suspend at sender.

and guest console shows

Подразделение Банка по месту оформления карты (ЦОПП №8610/07770 г.Казань, ул.Петербургская, 28 ,420107)

Доп.офис №8610/0138 г.Казань, ул.Чистопольская, 5 ,420066

lock order reversal:
 1st 0xfffffe004d0a4018 xnrx_0 (netfront receive lock, sleep mutex) @ /usr/src/sys/dev/xen/netfront/netfront.c:423
 2nd 0xfffffe004d0a8018 xntx_0 (netfront transmit lock, sleep mutex) @ /usr/src/sys/dev/xen/netfront/netfront.c:424
 3rd 0xfffffe004d0a4d28 xnrx_1 (netfront receive lock, sleep mutex) @ /usr/src/sys/dev/xen/netfront/netfront.c:423
lock order netfront receive lock -> netfront transmit lock established at:
#0 0xffffffff80c4408d at witness_checkorder+0x46d
#1 0xffffffff80bb3eb4 at __mtx_lock_flags+0x94
#2 0xffffffff80a60ab4 at gnttab_resume+0xad04
#3 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#4 0xffffffff80c12446 at bus_generic_suspend+0x66
#5 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#6 0xffffffff80c12446 at bus_generic_suspend+0x66
#7 0xffffffff80a6787b at xs_unlock+0x35b
#8 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#9 0xffffffff80c12446 at bus_generic_suspend+0x66
#10 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#11 0xffffffff80c12446 at bus_generic_suspend+0x66
#12 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#13 0xffffffff80c12446 at bus_generic_suspend+0x66
#14 0xffffffff80a54bca at xc_printf+0x162a
#15 0xffffffff80a548ae at xc_printf+0x130e
#16 0xffffffff80a67bd9 at xs_unlock+0x6b9
#17 0xffffffff80b92b30 at fork_exit+0x80
lock order netfront transmit lock -> netfront receive lock attempted at:
#0 0xffffffff80c449ec at witness_checkorder+0xdcc
#1 0xffffffff80bb3eb4 at __mtx_lock_flags+0x94
#2 0xffffffff80a60a9a at gnttab_resume+0xacea
#3 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#4 0xffffffff80c12446 at bus_generic_suspend+0x66
#5 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#6 0xffffffff80c12446 at bus_generic_suspend+0x66
#7 0xffffffff80a6787b at xs_unlock+0x35b
#8 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#9 0xffffffff80c12446 at bus_generic_suspend+0x66
#10 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#11 0xffffffff80c12446 at bus_generic_suspend+0x66
#12 0xffffffff80c12373 at bus_generic_suspend_child+0x43
#13 0xffffffff80c12446 at bus_generic_suspend+0x66
#14 0xffffffff80a54bca at xc_printf+0x162a
#15 0xffffffff80a548ae at xc_printf+0x130e
#16 0xffffffff80a67bd9 at xs_unlock+0x6b9
#17 0xffffffff80b92b30 at fork_exit+0x80
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bafd8f0
frame pointer           = 0x28:0xfffffe000bafd900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu1)
trap number             = 12
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 0e
fault virtual address   = 0x38
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bb1b8f0
frame pointer           = 0x28:0xfffffe000bb1b900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu7)
trap number             = 12
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x10
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 0a
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bb118f0
frame pointer           = 0x28:0xfffffe000bb11900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu5)
trap number             = 12
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bb028f0
frame pointer           = 0x28:0xfffffe000bb02900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu2)
trap number             = 12
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 08
fault virtual address   = 0x20
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe0043aab860
frame pointer           = 0x28:0xfffffe0043aab870
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 12 (irq2096: xc0)
trap number             = 12
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bb078f0
frame pointer           = 0x28:0xfffffe000bb07900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu3)
trap number             = 12
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 6; apic id = 0c
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80fe4894
stack pointer           = 0x28:0xfffffe000bb168f0
frame pointer           = 0x28:0xfffffe000bb16900
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11 (idle: cpu6)
trap number             = 12
timeout stopping cpus
panic: page fault
cpuid = 3
time = 1599291086
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000bb075a0
vpanic() at vpanic+0x182/frame 0xfffffe000bb075f0
panic() at panic+0x43/frame 0xfffffe000bb07650
trap_fatal() at trap_fatal+0x387/frame 0xfffffe000bb076b0
trap_pfault() at trap_pfault+0x97/frame 0xfffffe000bb07710
trap() at trap+0x2ab/frame 0xfffffe000bb07820
calltrap() at calltrap+0x8/frame 0xfffffe000bb07820
--- trap 0xc, rip = 0xffffffff80fe4894, rsp = 0xfffffe000bb078f0, rbp = 0xfffffe000bb07900 ---
cpususpend_handler() at cpususpend_handler+0x34/frame 0xfffffe000bb07900
xen_cpususpend_handler() at xen_cpususpend_handler+0x9/frame 0xfffffe000bb07910
intr_event_handle() at intr_event_handle+0xde/frame 0xfffffe000bb07960
intr_execute_handlers() at intr_execute_handlers+0x66/frame 0xfffffe000bb07990
xen_intr_handle_upcall() at xen_intr_handle_upcall+0x1c6/frame 0xfffffe000bb079e0
Xxen_intr_upcall() at Xxen_intr_upcall+0xb1/frame 0xfffffe000bb079e0
--- interrupt, rip = 0xffffffff80fdac72, rsp = 0xfffffe000bb07ab0, rbp = 0xfffffe000bb07ac0 ---
cpu_idle_acpi() at cpu_idle_acpi+0x42/frame 0xfffffe000bb07ac0
cpu_idle() at cpu_idle+0x9f/frame 0xfffffe000bb07ae0
sched_idletd() at sched_idletd+0x3d1/frame 0xfffffe000bb07bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe000bb07bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000bb07bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 11 tid 100006 ]
Stopped at      kdb_enter+0x37: movq    $0,0x10b6606(%rip)
db> trace
Tracing pid 11 tid 100006 td 0xfffffe000e3b7e00
kdb_enter() at kdb_enter+0x37/frame 0xfffffe000bb075a0
vpanic() at vpanic+0x19e/frame 0xfffffe000bb075f0
panic() at panic+0x43/frame 0xfffffe000bb07650
trap_fatal() at trap_fatal+0x387/frame 0xfffffe000bb076b0
trap_pfault() at trap_pfault+0x97/frame 0xfffffe000bb07710
trap() at trap+0x2ab/frame 0xfffffe000bb07820
calltrap() at calltrap+0x8/frame 0xfffffe000bb07820
--- trap 0xc, rip = 0xffffffff80fe4894, rsp = 0xfffffe000bb078f0, rbp = 0xfffffe000bb07900 ---
cpususpend_handler() at cpususpend_handler+0x34/frame 0xfffffe000bb07900
xen_cpususpend_handler() at xen_cpususpend_handler+0x9/frame 0xfffffe000bb07910
intr_event_handle() at intr_event_handle+0xde/frame 0xfffffe000bb07960
intr_execute_handlers() at intr_execute_handlers+0x66/frame 0xfffffe000bb07990
xen_intr_handle_upcall() at xen_intr_handle_upcall+0x1c6/frame 0xfffffe000bb079e0
Xxen_intr_upcall() at Xxen_intr_upcall+0xb1/frame 0xfffffe000bb079e0
--- interrupt, rip = 0xffffffff80fdac72, rsp = 0xfffffe000bb07ab0, rbp = 0xfffffe000bb07ac0 ---
cpu_idle_acpi() at cpu_idle_acpi+0x42/frame 0xfffffe000bb07ac0
cpu_idle() at cpu_idle+0x9f/frame 0xfffffe000bb07ae0
sched_idletd() at sched_idletd+0x3d1/frame 0xfffffe000bb07bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe000bb07bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000bb07bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db>

FWIW, I also filed a similar report for NetBSD few months ago:

netbsd domU does not migrate properly from one xen host to another
http://gnats.netbsd.org/55207
Comment 1 Roger Pau Monné freebsd_committer 2020-09-07 09:46:53 UTC
(In reply to Pierre-Philipp Braun from comment #0)
Thanks for the report, latest -RELEASE versions (11 and 12) should be fine as are tested by the Xen test system and migrate correctly, see:

http://logs.test-lab.xenproject.org/osstest/logs/153813/

Grep for the *freebsd* tests. I will look at what's going on with -CURRENT.