Killing pid 11, as root, like this: # kill -KILL 11 ... results in wedged io, until reboot - "diskinfo -t /dev/ada0" shows access times from 0.1 msec immediately go to 44 msec. Pid 11 is "idle".
(In reply to Edward Tomasz Napierala from comment #0) It is very _weird_ bug report from developer. You did not invesigated what is the state of the idle (?) process after the kill, obviously pid value does not matter since kernel starts variable number of the kprocs during the boot (on my kernel idle has pid 10). And last and most important, kernel processes do not process signals: there is no place where cursig()/postsig() pair is called, since there is no return from kernel to user mode, and no ast handler called. E.g. after I do kill -9 10 (pid 10 is my idle process), I see sandy% sudo procstat -i 10 PID COMM SIG FLAGS 10 idle KILL P-- I.e. SIGKILL was put into the queue, but nothing processed it. And I do not observe any weirdness in the system behaviour afterward. If your system consumed the SIGKILL, there should be some code which called postsig() in the context of the idle threads. FYI, the idle loop is sched_idletd(), private for the given scheduler implementation.
The report is weird, because the bug is. I have no idea what could be causing this, and how to approach debugging it. More data points: the slowdown is there also for md(4). You can make things go faster by holding "enter" at the terminal; basically try this: kill -KILL 11 mdconfig -s1g while :; do dd if=/dev/md0 of=/dev/null bs=1m count=256; done and then see what happens when you hold down the enter key, and otherwise.
(In reply to Edward Tomasz Napierala from comment #2) No, bug report is weird because it does not contain even an attempt to get an overview of the system state after it enters the un-understandable condition. ps auxwwH, procstat, looking for the threads eating 100% cpus, stopped clocks, whatever. As I said, I killed my idle process and did not see any change in the system behavior. The 'enter' key ticking some processing might indicate problems with stopped event clocks or clock interrupts. E.g. network interrupts coming in could have similar effect ? Anyway, collect as much as possible of the overview information for the system before and after the killing, and see what changed.
The reason I didn't provide more information is that I believed it to be trivially reproducible - and also because I can't tell much about it. That is, everything looks normal: the system is almost idle, vmstat/iostat/top etc don't show anything weird... It's just that "diskinfo -t" shows values orders of magnitude worse than usual. Actually, I've noticed something: idle threads priority. Before: # procstat -t 11 PID TID COMM TDNAME CPU PRI STATE WCHAN 11 100003 idle idle: cpu0 -1 255 run - 11 100004 idle idle: cpu1 1 255 run - # kill -KILL 11 [root@brick:/home/trasz]# procstat -t 11 PID TID COMM TDNAME CPU PRI STATE WCHAN 11 100003 idle idle: cpu0 0 120 run - 11 100004 idle idle: cpu1 -1 255 run - Bumping the idle thread's priority would explain what's happening. But why does it get bumped?
(In reply to Edward Tomasz Napierala from comment #4) Great, you nailed it down. Look at the tdsigwakeup(), which is called from the main code to send a signal to a thread tdsendsignal(): /* * Bring the priority of a thread up if we want it to get * killed in this lifetime. */ if (action == SIG_DFL && (prop & SA_KILL) && td->td_priority > PUSER) sched_prio(td, PUSER);
Created attachment 157247 [details] Do not bump idle thread priority
Created attachment 157248 [details] Do not bump idle thread priority Slight improvement, add a safety brake for the unreachable code.
(In reply to Konstantin Belousov from comment #7) And to express my opinion on the whole stuff. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Or, the priority bump for the idle thread, as you discovered. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The patch I provided IMO is good stop-gap measure which fixes the immediate ugly case.
The patch looks good to me as an interim solution to avoid the bizarre behaviour. I think we should put a comment explaining it though (unless there's a plan to implement the opt-in signal delivery in the short term).
The patch fixes the problem, thanks! I agree about the long-term strategy too.
A commit references this bug: Author: kib Date: Fri May 29 16:26:08 UTC 2015 New revision: 283745 URL: https://svnweb.freebsd.org/changeset/base/283745 Log: When delivering a signal with default disposition to the thread, tdsigwakeup() increases the priority of the low-priority threads, to give them a chance to be terminated timely. Also, kernel allows user to signal kernel processes. The combined effect is that signalling idle process bump a priority of the selected delivery thread, which starts eating CPU. Check for the delivery thread be an idle thread and do not raise its priority then. The signal delivery to the kernel threads must be opt-in feature. Kernel thread should explicitely declare the ability to handle signals directed to it. E.g., nfsd threads check for signal as an indication of exit request. Most threads do not handle signals at all, and queuing the signal to them causes odd side-effects. Most innocent consequence is the memory leak due to queued ksiginfo, which is never deleted from the sigqueue. Code to prevent even queuing signals to the kernel threads is trivial, but it requires careful examination of each call to kproc/kthread creation to decide should the signalling be allowed. The commit is a stop-gap measure which fixes the immediate case for now. PR: 200493 Reported and tested by: trasz Discussed with: trasz, emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Changes: head/sys/kern/kern_sig.c