Hello, It seems like some VIMAGE changes have broken netfront suspend/resume, here is the trace on resume from suspension: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c92f18 stack pointer = 0x28:0xfffffe0000571670 frame pointer = 0x28:0xfffffe0000571690 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 16 (xenwatch) Tracing pid 16 tid 100074 td 0xfffff80003890580 ether_output_frame() at ether_output_frame+0x58/frame 0xfffffe0000571690 ether_output() at ether_output+0x68b/frame 0xfffffe0000571730 arprequest() at arprequest+0x444/frame 0xfffffe0000571840 arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe0000571880 netfront_backend_changed() at netfront_backend_changed+0x1b4/frame 0xfffffe0000571940 xenwatch_thread() at xenwatch_thread+0x182/frame 0xfffffe0000571970 fork_exit() at fork_exit+0x84/frame 0xfffffe00005719b0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00005719b0 The code that triggers the bug in netfront is: /** * If this interface has an ipv4 address, send an arp for it. This * helps to get the network going again after migrating hosts. */ static void netfront_send_fake_arp(device_t dev, struct netfront_info *info) { struct ifnet *ifp; struct ifaddr *ifa; ifp = info->xn_ifp; CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family == AF_INET) { arp_ifinit(ifp, ifa); } } } I really have no idea of what's missing here, so any help by people involved with VIMAGE in order to fix this regression would be greatly appreciated.
This likely means you wound up calling arp_ifnet() without a curvnet set. That's important context for basically all networking operations, but it's not going to be set when called from a kernel thread, as appears to be the case here. Usually all you have to do is ensure that the correct vnet is set. This is untested (and may not even compile), but might fix it: diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c index ba8ac3caf7f..12938354f9c 100644 --- a/sys/dev/xen/netfront/netfront.c +++ b/sys/dev/xen/netfront/netfront.c @@ -962,6 +962,8 @@ netfront_backend_changed(device_t dev, XenbusState newstate) DPRINTK("newstate=%d\n", newstate); + CURVNET_SET(sc->xn_ifp->vnet); + switch (newstate) { case XenbusStateInitialising: case XenbusStateInitialised: @@ -994,6 +996,8 @@ netfront_backend_changed(device_t dev, XenbusState newstate) #endif break; } + + CURVNET_RESTORE(); } /**
(In reply to Kristof Provost from comment #1) Thanks! That does indeed seems to solve the issue, it just has a minor typo: s/vnet/if_vnet/. Would you like to commit it?
A commit references this bug: Author: kp Date: Thu Aug 23 16:52:53 UTC 2018 New revision: 338256 URL: https://svnweb.freebsd.org/changeset/base/338256 Log: xen/netfront: Ensure curvnet is set netfront_backend_changed() is called from the xenwatch_thread(), which means that the curvnet is not set. We have to set it before we can call things like arp_ifinit(). PR: 230845 Changes: head/sys/dev/xen/netfront/netfront.c