Bug 230845 - VIMAGE regression: breaks netfront suspend/resume
Summary: VIMAGE regression: breaks netfront suspend/resume
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: Kristof Provost
URL:
Keywords: crash, regression, vimage
Depends on:
Blocks:
 
Reported: 2018-08-23 15:25 UTC by Roger Pau Monné
Modified: 2018-09-29 10:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Pau Monné freebsd_committer 2018-08-23 15:25:07 UTC
Hello,

It seems like some VIMAGE changes have broken netfront suspend/resume, here is the trace on resume from suspension:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c92f18
stack pointer           = 0x28:0xfffffe0000571670
frame pointer           = 0x28:0xfffffe0000571690
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 16 (xenwatch)

Tracing pid 16 tid 100074 td 0xfffff80003890580
ether_output_frame() at ether_output_frame+0x58/frame 0xfffffe0000571690
ether_output() at ether_output+0x68b/frame 0xfffffe0000571730
arprequest() at arprequest+0x444/frame 0xfffffe0000571840
arp_ifinit() at arp_ifinit+0x58/frame 0xfffffe0000571880
netfront_backend_changed() at netfront_backend_changed+0x1b4/frame 0xfffffe0000571940
xenwatch_thread() at xenwatch_thread+0x182/frame 0xfffffe0000571970
fork_exit() at fork_exit+0x84/frame 0xfffffe00005719b0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00005719b0

The code that triggers the bug in netfront is:

/**
 * If this interface has an ipv4 address, send an arp for it. This
 * helps to get the network going again after migrating hosts.
 */
static void
netfront_send_fake_arp(device_t dev, struct netfront_info *info)
{
	struct ifnet *ifp;
	struct ifaddr *ifa;

	ifp = info->xn_ifp;
	CK_STAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
		if (ifa->ifa_addr->sa_family == AF_INET) {
			arp_ifinit(ifp, ifa);
		}
	}
}

I really have no idea of what's missing here, so any help by people involved with VIMAGE in order to fix this regression would be greatly appreciated.
Comment 1 Kristof Provost freebsd_committer 2018-08-23 16:03:40 UTC
This likely means you wound up calling arp_ifnet() without a curvnet set.
That's important context for basically all networking operations, but it's not going to be set when called from a kernel thread, as appears to be the case here.

Usually all you have to do is ensure that the correct vnet is set.

This is untested (and may not even compile), but might fix it:

diff --git a/sys/dev/xen/netfront/netfront.c b/sys/dev/xen/netfront/netfront.c
index ba8ac3caf7f..12938354f9c 100644
--- a/sys/dev/xen/netfront/netfront.c
+++ b/sys/dev/xen/netfront/netfront.c
@@ -962,6 +962,8 @@ netfront_backend_changed(device_t dev, XenbusState newstate)

        DPRINTK("newstate=%d\n", newstate);

+       CURVNET_SET(sc->xn_ifp->vnet);
+
        switch (newstate) {
        case XenbusStateInitialising:
        case XenbusStateInitialised:
@@ -994,6 +996,8 @@ netfront_backend_changed(device_t dev, XenbusState newstate)
 #endif
                break;
        }
+
+       CURVNET_RESTORE();
 }

 /**
Comment 2 Roger Pau Monné freebsd_committer 2018-08-23 16:26:18 UTC
(In reply to Kristof Provost from comment #1)
Thanks! That does indeed seems to solve the issue, it just has a minor typo: s/vnet/if_vnet/. Would you like to commit it?
Comment 3 commit-hook freebsd_committer 2018-08-23 16:53:34 UTC
A commit references this bug:

Author: kp
Date: Thu Aug 23 16:52:53 UTC 2018
New revision: 338256
URL: https://svnweb.freebsd.org/changeset/base/338256

Log:
  xen/netfront: Ensure curvnet is set

  netfront_backend_changed() is called from the xenwatch_thread(), which means
  that the curvnet is not set. We have to set it before we can call things like
  arp_ifinit().

  PR:		230845

Changes:
  head/sys/dev/xen/netfront/netfront.c