Created attachment 188497 [details] Output from pciconf -lvbc I get a lock or LOR or similar when resuming from suspend with the network cable plugged in. This is on a Lenovo Thinkpad X270. The machine resumes fine, but after a little while (minutes, if not less) the machine freezes. I can feel it get warm and the fans spin, as if the CPU is working 100%. It feels like the lockup happens once there's traffic on the NIC after the resume. Suspend/resume when not using the NIC works fine (such as when using WiFi). There is nothing on the screen when this happens, the screen just freezes in the way it was, with no reaction on keyboard input and nothing on the console. When I reboot, however, there is the following in /var/log/messages, which is what led me to em0. kernel: reversal: kernel: em0:tx(0):callo (em0:tx(0):callo) @ /usr/src/sys/kern/kern_mutex.c:182 kernel: /usr/src/sys/net/iflib.c:2143 kernel: backtrace: kernel: #0 0xffffffff805a3e93 at witness_debugger+0x73 kernel: #1 0xffffffff805a3d12 at witness_checkorder+0xe02 kernel: #2 0xffffffff8051fd6c at __mtx_lock_flags+0x9c kernel: #3 0xffffffff80653789 at iflib_timer+0x149 kernel: #4 0xffffffff8055856c at softclock_call_cc+0x14c kernel: #5 0xffffffff8055892c at softclock+0x7c kernel: #6 0xffffffff805046a9 at intr_event_execute_handlers+0x99 kernel: #7 0xffffffff80504d96 at ithread_loop+0xb6 kernel: #8 0xffffffff80501ae4 at fork_exit+0x84 kernel: #9 0xffffffff8087718e at fork_trampoline+0xe System is: FreeBSD garnet.daemonic.se 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r325963M: Sat Nov 18 14:01:30 CET 2017 root@garnet.daemonic.se:/usr/obj/usr/src/amd64.amd64/sys/GARNET amd64 Attached is also output from pciconf -lvbc
Updated to latest source (r326539) and the deadlock is still there. Same trace, only difference is the line number in sys/kern/kern_mutex.c, which is 184 now.
Here's a bug report I was just about to file when I heard about this one. --- if_em is loadable module. Network on em0 not working after suspend/resume. ifconfig output stuck after nd6 options... line. procstat -ak: --- ifconfig - mi_switch turnstile_wait __mtx_lock_sleep __mtx_lock_flags iflib_media_status ifmedia_ioctl ifioctl ... --- While writing this on another machine, the machine with stuck ifconfig rebooted by itself (about 5 minutes after doing resume and issuing ifconfig command). backtrace: #0 doadump (textdump=0) at pcpu.h:230 #1 0xffffffff81d94528 in vt_kms_postswitch () from /boot/modules.drm-v4.9/drm.ko #2 0xffffffff80543b78 in vt_window_switch (vw=0xffffffff80c99e28) at /usr/src/sys/dev/vt/vt_core.c:563 #3 0xffffffff805412a0 in vtterm_cngrab (tm=<value optimized out>) at /usr/src/sys/dev/vt/vt_core.c:1530 #4 0xffffffff80648162 in cngrab () at /usr/src/sys/kern/kern_cons.c:370 #5 0xffffffff806a8acb in vpanic (fmt=0xffffffff80b0fac3 "%s: possible deadlock detected for %p, blocked for %d ticks\n", ap=0xfffffe00407f2a00) at /usr/src/sys/kern/kern_shutdown.c:786 #6 0xffffffff806a8c03 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:739 #7 0xffffffff806429dc in deadlkres () at /usr/src/sys/kern/kern_clock.c:242 #8 0xffffffff80669144 in fork_exit (callout=0xffffffff80642680 <deadlkres>, arg=0x0, frame=0xfffffe00407f2ac0) at /usr/src/sys/kern/kern_fork.c:1039 #9 0xffffffff809f9dbe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:843 #10 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb)
(In reply to Niclas Zeising from comment #1) Please try the patch. The problem seems to be caused by the function iflib_init_locked() inside iflib_device_resume(), and iflib_init_locked() should be invoked after iflib_stop(). Probably, i219 has a problem when its PCI power state changes to D3. You may loose a internet connection after resume because of a failure of waking up. As a workaround, hw.pci.do_power_suspend=0 prevents to change from D1 to D3 state, but it makes energy consumption more. --- sys/net/iflib.c (revision 330961) +++ sys/net/iflib.c (working copy) @@ -4526,6 +4526,7 @@ if_ctx_t ctx = device_get_softc(dev); CTX_LOCK(ctx); + iflib_stop(ctx); IFDI_SUSPEND(ctx); CTX_UNLOCK(ctx);
I have this problem on two Lenovo laptops with a 12.0-REL install, although not always after wakeup. One is a X220, it displays this in syslog (not in all cases): Dec 16 13:32:08 udog kernel: em0: TX(0) desc avail = 1024, pidx = 0 The other is a X201, which worked fine for years under 11.XpX. Both have hw.pci.do_power_suspend=0 in /etc/sysctl.conf. I'll try the patch given in comment 3 now and report back.
It looks like the patch from commen#3 fixes the problem.
Could you try this patch instead? Index: iflib.c =================================================================== --- iflib.c (revision 341824) +++ iflib.c (working copy) @@ -4894,7 +4894,7 @@ CTX_LOCK(ctx); IFDI_RESUME(ctx); - iflib_init_locked(ctx); + iflib_if_init_locked(ctx); CTX_UNLOCK(ctx); for (int i = 0; i < NTXQSETS(ctx); i++, txq++) iflib_txq_check_drain(txq, IFLIB_RESTART_BUDGET);
Kernel is installed, tests pending. I'll get back to you in approx. 24h.
Test looks fine on X220 with hw.pci.do_power_suspend on default 1.
A commit references this bug: Author: shurd Date: Mon Jan 7 23:46:54 UTC 2019 New revision: 342855 URL: https://svnweb.freebsd.org/changeset/base/342855 Log: Use iflib_if_init_locked() during resume instead of iflib_init_locked(). iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for suspend. iflib_if_init_locked() calls stop then init, so fixes the problem. This was causing errors after a resume from suspend. PR: 224059 Reported by: zeising MFC after: 1 week Sponsored by: Limelight Networks Changes: head/sys/net/iflib.c
A commit references this bug: Author: shurd Date: Mon Jan 14 18:40:37 UTC 2019 New revision: 343024 URL: https://svnweb.freebsd.org/changeset/base/343024 Log: MFC r342855: Use iflib_if_init_locked() during resume instead of iflib_init_locked(). iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for suspend. iflib_if_init_locked() calls stop then init, so fixes the problem. This was causing errors after a resume from suspend. PR: 224059 Reported by: zeising Sponsored by: Limelight Networks Changes: _U stable/12/ stable/12/sys/net/iflib.c
Is this applicable to stable/11 or what merge would cause it to be applicable to stable/11. I have concerns that this regresion is or could end up in the path of the upcoming 11.3 release. Thanks, Rod <RE
(In reply to Rodney W. Grimes from comment #11) It should be directly applicable to stable/11. I'll take a closer look now.
A commit references this bug: Author: shurd Date: Wed Jan 16 19:20:14 UTC 2019 New revision: 343099 URL: https://svnweb.freebsd.org/changeset/base/343099 Log: MFC r342855: Use iflib_if_init_locked() during resume instead of iflib_init_locked(). iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for suspend. iflib_if_init_locked() calls stop then init, so fixes the problem. This was causing errors after a resume from suspend. PR: 224059 Reported by: zeising Sponsored by: Limelight Networks Changes: _U stable/11/ stable/11/sys/net/iflib.c
Well, there's still no fix for 12.0p7, isn't it reasonable to do an EN for this ?