Bug 254514 - vnet: /sbin/ifconfig epair10b vnet $name getting stuck if one CPU is busy
Summary: vnet: /sbin/ifconfig epair10b vnet $name getting stuck if one CPU is busy
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-03-23 20:56 UTC by Mina Galić
Modified: 2021-07-06 16:46 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mina Galić freebsd_triage 2021-03-23 20:56:21 UTC
This bug bubbled up as a side-effect to: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254513

jail startup is stuck at:

/sbin/ifconfig epair10b vnet $name

when we run procstat kstack, we see for the different jails:

  913 100682 ifconfig            -                   mi_switch _sx_xlock_hard epoch_drain_callbacks if_detach_internal if_vmove ifhwioctl ifioctl kern_ioctl sys_ioctl amd64_syscall fast_syscall_common 

  835 100475 ifconfig            -                   mi_switch sched_bind epoch_drain_callbacks if_detach_internal if_vmove ifhwioctl ifioctl kern_ioctl sys_ioctl amd64_syscall fast_syscall_common 

similarly, trying to destroy an epair also gets stuck:

 1119 100988 ifconfig            -                   mi_switch _sx_xlock_hard epoch_drain_callbacks if_detach_internal if_detach epair_clone_destroy if_clone_destroyif if_clone_destroy ifioctl kern_ioctl sys_ioctl amd64_syscall fast_syscall_common 

given that this is a side-effect of 1 CPU core being 100% busy, does this mean that draining callbacks needs all CPUs?
Comment 1 Andrew "RhodiumToad" Gierth 2021-03-24 15:16:01 UTC
(In reply to Mina Galić from comment #0)

> given that this is a side-effect of 1 CPU core being 100% busy, does this mean that draining callbacks needs all CPUs?

Yes, it does mean that. Draining epoch callbacks is done by having the thread doing the drain bind itself to each cpu in turn, so it'll block if for any reason it can't be scheduled any runtime on some cpu (such as if a cpu is 100% busy at a priority that doesn't let it be preempted by the drain).
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2021-03-24 15:29:15 UTC
I tried to fix this a while ago with https://reviews.freebsd.org/D24621

I'll get back to it asap.
Comment 3 Jeffrey Gelens 2021-06-28 08:10:42 UTC
This problem hits me as well:
```load: 1.48  cmd: ifconfig 1272 [runnable] 18.73r 0.00u 0.00s 0% 3020k
mi_switch+0xc1 sched_bind+0x74 epoch_drain_callbacks+0x15c if_detach_internal+0x60 if_vmove+0x3c ifhwioctl+0x1013 ifioctl+0x50c kern_ioctl+0x26d sys_ioctl+0xf6 amd64_syscall+0x10c fast_syscall_common+0xf8
```

Any luck on the fix?
Comment 4 Bjoern A. Zeeb freebsd_committer freebsd_triage 2021-07-06 16:46:14 UTC
Is this also realted to PR227100 ?