I am seeing a strange issue running CURRENT on KVM on Sapphire Rapids. If I create an interface like `ifconfig wg create` and then destroy the interface via `ifconfig wg0 destroy`, the process hangs deep within the kernel. load: 1.55 cmd: ifconfig 2277 [runnable] 31.14r 0.00u 0.00s 0% 3276k mi_switch+0x175 sched_bind+0xbc epoch_drain_callbacks+0x179 wg_clone_destroy+0x25c if_clone_destroyif_flags+0x69 if_clone_destroy+0xff ifioctl+0x8d3 kern_ioctl+0x1fe sys_ioctl+0x154 amd64_syscall+0x140 fast_syscall_common+0xf8
sched_bind() moves the calling thread to a different CPU, so here it's switched off the old CPU and waiting for a chance to run on the new one. If that's not happening quickly, it's probably because some other, higher-priority thread is monopolizing the target CPU. top -H ought to be able to confirm whether that's the case.
@markj Thanks for the clarification there. This only happens while running in a KVM guest (Ubuntu 22.04, 5.19 kernel) and does not when running directly on the same hardware. Additionally, this happens on an otherwise idle box.
(In reply to R. Christian McDonald from comment #2) That is pretty weird. Could you share the output of "procstat -kka" while the hang is occurring? Do you have INVARIANTS enabled?
Created attachment 242766 [details] procstat -kka
(In reply to Mark Johnston from comment #3) Yep INVARIANTS enabled. # sysctl kern.conftxt | grep INVAR options INVARIANT_SUPPORT options INVARIANTS # uname -a FreeBSD SRVM-DUT-4 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263002-743516d51fa7: Thu May 18 08:06:33 UTC 2023 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Comment on attachment 242766 [details] procstat -kka oh, you also need to pass -S to top. I see one thread which looks to be spinning: 7 100267 rand_harvestq - vtrnd_read+0xa4 random_kthread+0x174 fork_exit+0x80 fork_trampoline+0xe
(In reply to Mark Johnston from comment #6) last pid: 2028; load averages: 2.00, 2.01, 1.83 up 0+00:39:17 14:42:30 515 threads: 21 running, 450 sleeping, 1 stopped, 43 waiting CPU: 0.0% user, 0.0% nice, 5.6% system, 0.0% interrupt, 94.3% idle Mem: 17M Active, 16M Inact, 271M Wired, 56K Buf, 7570M Free ARC: 35M Total, 6297K MFU, 27M MRU, 359K Header, 1734K Other 15M Compressed, 43M Uncompressed, 2.84:1 Ratio Swap: 2048M Total, 2048M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 187 ki31 0B 288K RUN 2 39:16 100.00% idle{idle: cpu2} 11 root 187 ki31 0B 288K CPU13 13 39:16 100.00% idle{idle: cpu13} 11 root 187 ki31 0B 288K CPU8 8 39:15 100.00% idle{idle: cpu8} 11 root 187 ki31 0B 288K CPU7 7 39:15 100.00% idle{idle: cpu7} 11 root 187 ki31 0B 288K CPU15 15 39:15 100.00% idle{idle: cpu15} 11 root 187 ki31 0B 288K CPU10 10 39:15 100.00% idle{idle: cpu10} 11 root 187 ki31 0B 288K CPU14 14 39:13 100.00% idle{idle: cpu14} 11 root 187 ki31 0B 288K CPU9 9 39:12 100.00% idle{idle: cpu9} 7 root -16 - 0B 16K CPU4 4 39:10 100.00% rand_harvestq 11 root 187 ki31 0B 288K CPU17 17 39:09 100.00% idle{idle: cpu17} 11 root 187 ki31 0B 288K CPU5 5 39:17 98.97% idle{idle: cpu5} 11 root 187 ki31 0B 288K CPU6 6 39:16 98.97% idle{idle: cpu6} 11 root 187 ki31 0B 288K CPU3 3 39:16 98.97% idle{idle: cpu3} 11 root 187 ki31 0B 288K CPU11 11 39:16 98.97% idle{idle: cpu11} 11 root 187 ki31 0B 288K CPU16 16 39:15 98.97% idle{idle: cpu16} 11 root 187 ki31 0B 288K CPU12 12 39:14 98.97% idle{idle: cpu12} 11 root 187 ki31 0B 288K CPU1 1 39:14 98.97% idle{idle: cpu1} 11 root 187 ki31 0B 288K CPU0 0 39:15 98.00% idle{idle: cpu0} 12 root -60 - 0B 400K WAIT 17 0:05 0.98% intr{swi0: uart} 0 root -16 - 0B 3664K swapin 0 6:02 0.00% kernel{swapper} 11 root 187 ki31 0B 288K RUN 4 0:07 0.00% idle{idle: cpu4} 15 root -60 - 0B 80K - 10 0:02 0.00% usb{usbus0} 6 root -8 - 0B 2464K tx->tx 16 0:02 0.00% zfskern{txg_thread_enter} 12 root -64 - 0B 400K WAIT 9 0:01 0.00% intr{irq31: virtio_pci2} 2 root -60 - 0B 288K WAIT 0 0:00 0.00% clock{clock (0)} 1974 root 21 0 17M 4896K STOP 14 0:00 0.00% top 8 root -16 - 0B 64K psleep 13 0:00 0.00% pagedaemon{dom0} 12 root -64 - 0B 400K WAIT 7 0:00 0.00% intr{irq29: virtio_pci1} 813 ntpd 20 0 21M 5940K select 7 0:00 0.00% ntpd{ntpd} 1597 root 20 0 13M 3476K wait 11 0:00 0.00% sh 0 root -12 - 0B 3664K - 15 0:00 0.00% kernel{z_wr_iss_13} 0 root -12 - 0B 3664K - 14 0:00 0.00% kernel{z_wr_iss_3} 0 root -12 - 0B 3664K - 8 0:00 0.00% kernel{z_wr_iss_9} 0 root -12 - 0B 3664K - 12 0:00 0.00% kernel{z_wr_iss_5} 0 root -12 - 0B 3664K - 16 0:00 0.00% kernel{z_wr_iss_2} 1794 root 21 0 21M 9088K select 9 0:00 0.00% sshd 0 root -12 - 0B 3664K - 11 0:00 0.00% kernel{z_wr_iss_10} 749 root 20 0 13M 3032K select 12 0:00 0.00% syslogd 0 root -12 - 0B 3664K - 0 0:00 0.00% kernel{z_wr_iss_11} 0 root -12 - 0B 3664K - 10 0:00 0.00% kernel{z_wr_iss_7} 0 root -12 - 0B 3664K - 12 0:00 0.00% kernel{z_wr_iss_6} 0 root -16 - 0B 3664K - 11 0:00 0.00% kernel{z_wr_int_2_1} 0 root -12 - 0B 3664K - 16 0:00 0.00% kernel{z_wr_iss_8} 0 root -12 - 0B 3664K - 17 0:00 0.00% kernel{z_wr_iss_12}
So this is a problem where the virtio random driver is constantly polling the host for entropy but not getting it for some reason. I'm not sure why that would be - a bug in the virtio driver or some kind of restrictive configuration by the hypervisor.
(In reply to Mark Johnston from comment #8) Cool. blacklisting virtio_random works around the hang.
I discovered today that it is possible to reproduce this bug on non-Sapphire Rapids systems. The difference being the use of the legacy vs modern virtio-rng. The legacy virtio-rng doesn't exhibit this problem, it's the modern virtio-rng that does. I tested on a non-Sapphire Rapids PVE host this morning and forced the virtio-rng to explicitly use the modern device (virtio-rng-pci-non-transitional) instead of the legacy (transitional) device (which is what PVE sets up by default) and the problem manifests there as well. Conversely, if I explicitly use the legacy device on the Sapphire Rapids box, the problem goes away there too.