Summary: | intr_event_handle is unsafe with respect to interrupt handler list | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Andriy Gapon <avg> | ||||
Component: | kern | Assignee: | Andriy Gapon <avg> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Only Me | CC: | cem, imp, jhb, kib | ||||
Priority: | --- | ||||||
Version: | CURRENT | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Andriy Gapon
2018-06-18 11:36:38 UTC
This is an old race. I have a very old p4 branch that dedicates a spin lock to protecting the list of handlers (and then uses that as the thread_lock of idle interrupt threads so that the number of spin locks taken for an interrupt remains the same) that fixes this. Yes, it's an old one. I think that using a spinlock is perfectly fine. I also have a work-in-progress that tries to solve the problem with some atomic magic so that intr_event_handle is completely wait / spin free (if the interrupt thread does not need to run, of course). I guess that a difference between those two approaches would be negligible. Just FYI for next time, if you're making code changes, fail_point(9) can make this kind of scenario much easier to hit. In particular, you can expand races by adding in an empty KFAIL_POINT_CODE() and then inducing a delay with sysctl my.failpoint="50%delay(1000)" (i.e., DELAY(1000) 50% of times we hit that failpoint; the 50% can help avoid excessive foot-shooting). Is this list a good candidate for epoch(9)-based reclamation? :-) I suspect you cannot use epoch(9) in primary interrupt context (e.g. filters) just as you can't use regular mutexes there. But a similar approach might work. Where does epoch(9) take a regular mtx? I may just be missing it, but I see only spin mutexes. (In reply to Conrad Meyer from comment #7) I don't know epoch(9) / ck_epoch implementation details, but given the constraints[*] of interrupt handling, I think that it could be an overkill. I mean, it probably has some overhead that's avoidable. And for that latency sensitive code it could matter. [*] The interrupt handler list can be iterated only by a single thread / context at a time. Also, we reduce modifications to a single thread at a time using a mutex. Performance of adding and removing interrupt handlers is not critical. (In reply to Andriy Gapon from comment #8) Yes, performance of add/remove doesn't matter. But performance of intr_event_handle() matters very much, and adding a contested spin lock to it seems like a bad idea. I suppose it only needs to be taken for shared interrupts, not MSI, so maybe the impact is not so bad. To be clear, we are not talking about adding a spin mutex. We are talking about replacing the existing sched_lock mutex used to schedule the ithread anyway with a dedicated spin mutex that protects the list of handlers and can be used as the THREAD_LOCK when scheduling the ithread since ithreads are changed in this patch to set td_lock to the new mutex when they are idle. You can't get away from the mutex to call sched_add(), but this lets you change what lock is now used for that in essence. Sure. It'd still be a single global spin lock, right? No, it is kind of per-IRQ, not global. Oh, that's much better. I have created https://reviews.freebsd.org/D15905 It's a hybrid between the approach I mentioned in comment #2 and what John mentioned (comment #1 and others) and what the current code has. A commit references this bug: Author: avg Date: Fri Aug 3 14:27:29 UTC 2018 New revision: 337255 URL: https://svnweb.freebsd.org/changeset/base/337255 Log: safer wait-free iteration of shared interrupt handlers The code that iterates a list of interrupt handlers for a (shared) interrupt, whether in the ISR context or in the context of an interrupt thread, does so in a lock-free fashion. Thus, the routines that modify the list need to take special steps to ensure that the iterating code has a consistent view of the list. Previously, those routines tried to play nice only with the code running in the ithread context. The iteration in the ISR context was left to a chance. After commit r336635 atomic operations and memory fences are used to ensure that ie_handlers list is always safe to navigate with respect to inserting and removal of list elements. There is still a question of when it is safe to actually free a removed element. The idea of this change is somewhat similar to the idea of the epoch based reclamation. There are some simplifications comparing to the general epoch based reclamation. All writers are serialized using a mutex, so we do not need to worry about concurrent modifications. Also, all read accesses from the open context are serialized too. So, we can get away just two epochs / phases. When a thread removes an element it switches the global phase from the current phase to the other and then drains the previous phase. Only after the draining the removed element gets actually freed. The code that iterates the list in the ISR context takes a snapshot of the global phase and then increments the use count of that phase before iterating the list. The use count (in the same phase) is decremented after the iteration. This should ensure that there should be no iteration over the removed element when its gets freed. This commit also simplifies the coordination with the interrupt thread context. Now we always schedule the interrupt thread when removing one of handlers for its interrupt. This makes the code both simpler and safer as the interrupt thread masks the interrupt thus ensuring that there is no interaction with the ISR context. P.S. This change matters only for shared interrupts and I realize that those are becoming a thing of the past (and quickly). I also understand that the problem that I am trying to solve is extremely rare. PR: 229106 Reviewed by: cem Discussed with: Samy Al Bahra MFC after: 5 weeks Differential Revision: https://reviews.freebsd.org/D15905 Changes: head/sys/kern/kern_intr.c head/sys/sys/interrupt.h |