When testing review D47392, I found the following: # service ipfilter stop calling _callout_stop_safe with the following non-sleepable locks held: shared rw ipf filter load/unload mutex (ipf filter load/unload mutex) r = 0 (0xffff0000417c7530) locked @ /usr/src/sys/netpfil/ipfilter/netinet/fil.c:7926 stack backtrace: #0 0xffff00000052d394 at witness_debugger+0x60 #1 0xffff00000052e620 at witness_warn+0x404 #2 0xffff0000004d4ffc at _callout_stop_safe+0x8c #3 0xffff0000f7236674 at ipfdetach+0x3c #4 0xffff0000f723fa4c at ipf_ipf_ioctl+0x788 #5 0xffff0000f72367e0 at ipfioctl+0x144 #6 0xffff00000034abd8 at devfs_ioctl+0x100 #7 0xffff0000005c66a0 at vn_ioctl+0xbc #8 0xffff00000034b2cc at devfs_ioctl_f+0x24 #9 0xffff0000005331ec at kern_ioctl+0x2e0 #10 0xffff000000532eb4 at sys_ioctl+0x140 #11 0xffff000000880480 at do_el0_sync+0x604 #12 0xffff0000008579ac at handle_el0_sync+0x4c I realized that WRITE_ENTER/RWLOCK_EXIT is not needed for ipfdetach(). Not sure if it should be removed from fil.c and/or used only for ipfattach() in ip_fil_freebsd.c. # uname -mnr freebsd-15-0 15.0-CURRENT arm64
To fix this, I think it'd be sufficient to: 1. Use callout_init_rw() to initialize ipf_slow_ch, and remove the locking in ipf_timer_func(). 2. Use callout_stop() instead of callout_drain(). This assumes that the global ipf lock is not dropped while executing ipf_timer_func(), which I believe is the case. If so, then callout_stop() can cancel a pending callout without sleeping, avoiding the problem which triggers that warning.
Created attachment 254881 [details] ipfilter: Avoid stopping with a lock held Proposed patch based on comment #1.
(In reply to Jose Luis Duran from comment #2) This looks correct to me.
The current patch triggers the following panic when running the tests: panic: Lock ipf filter load/unload mutex not exclusively locked @ /usr/src/sys/kern/kern_rwlock.c:166 cpuid = 7 time = 1730590683 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 __rw_assert() at __rw_assert+0xc8 _callout_stop_safe() at _callout_stop_safe+0x6c ipfdetach() at ipfdetach+0x3c vnet_ipf_uninit() at vnet_ipf_uninit+0x68 vnet_destroy() at vnet_destroy+0x15c prison_deref() at prison_deref+0xbec sys_jail_remove() at sys_jail_remove+0x114 do_el0_sync() at do_el0_sync+0x604 handle_el0_sync() at handle_el0_sync+0x4c --- exception, esr 0x56000000 KDB: enter: panic [ thread pid 17264 tid 100676 ] Stopped at kdb_enter+0x48: str xzr, [x19, #2048] I'll look into it more closely.
This looks good to me. Approved. Please MFC after a week or two.
(In reply to Cy Schubert from comment #5) Looking at this a little more, this is wrong. I will rebuild and reproduce here.
Created attachment 254890 [details] WIP: As described in comment #0 As shown in comment #3, the patch panics. The following WIP is what I originally had, it works, but is not a real fix I guess. I'll keep looking. Thank you!
(In reply to Jose Luis Duran from comment #7) This may be better, not sure yet as I haven't had time to look at it further; we share patches like this (not specific to FreeBSD) with NetBSD. Looking at their fil.c, there are no commits addressing this bug, assuming it is there. We're breaking new ground. I've enabled invariants and witness on my production firewall, and will test.
(In reply to Jose Luis Duran from comment #4) Ah, the code which initializes the callout and schedules it also needs to be holding the write lock. Also, you can initialize the callout with CALLOUT_SHAREDLOCK to preserve the existing behaviour of acquiring the read lock in the callout handler.
(In reply to Mark Johnston from comment #9) I'll look at it later today. Testing was sidetracked by the "nfs, rpc" lock panic from last night. I'll be mostly AFK all day today.
Created attachment 254917 [details] ipfilter: Avoid stopping with a lock held Update patch accordingly.
Created attachment 254918 [details] ipfilter: Avoid stopping with a lock held Previous patch had a gratuitous extra white space. Sorry.
(In reply to Jose Luis Duran from comment #12) LGTM & it fixes the problem! Approved.
Please MFC this after a week.
Will you please commit this.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1fa6daaafd74c1a457dcfe26e0a5943b5441dc9d commit 1fa6daaafd74c1a457dcfe26e0a5943b5441dc9d Author: Jose Luis Duran <jlduran@FreeBSD.org> AuthorDate: 2024-11-02 17:58:59 +0000 Commit: Jose Luis Duran <jlduran@FreeBSD.org> CommitDate: 2024-11-12 03:46:15 +0000 ipfilter: Avoid stopping with a lock held Avoid calling _callout_stop_safe with a non-sleepable lock held when detaching by initializing callout_init_rw() with CALLOUT_SHAREDLOCK. It avoids the following WITNESS warning when stopping the service: # service ipfilter stop calling _callout_stop_safe with the following non-sleepable locks held: shared rw ipf filter load/unload mutex (ipf filter load/unload mutex) r = 0 (0xffff0000417c7530) locked @ /usr/src/sys/netpfil/ipfilter/netinet/fil.c:7926 stack backtrace: #0 0xffff00000052d394 at witness_debugger+0x60 #1 0xffff00000052e620 at witness_warn+0x404 #2 0xffff0000004d4ffc at _callout_stop_safe+0x8c #3 0xffff0000f7236674 at ipfdetach+0x3c #4 0xffff0000f723fa4c at ipf_ipf_ioctl+0x788 #5 0xffff0000f72367e0 at ipfioctl+0x144 #6 0xffff00000034abd8 at devfs_ioctl+0x100 #7 0xffff0000005c66a0 at vn_ioctl+0xbc #8 0xffff00000034b2cc at devfs_ioctl_f+0x24 #9 0xffff0000005331ec at kern_ioctl+0x2e0 #10 0xffff000000532eb4 at sys_ioctl+0x140 #11 0xffff000000880480 at do_el0_sync+0x604 #12 0xffff0000008579ac at handle_el0_sync+0x4c PR: 282478 Suggested by: markj Reviewed by: cy Approved by: emaste (mentor) MFC after: 1 week sys/netpfil/ipfilter/netinet/ip_fil_freebsd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a0618fbe19dfedcdf01b4c232fe6669ae19505c4 commit a0618fbe19dfedcdf01b4c232fe6669ae19505c4 Author: Jose Luis Duran <jlduran@FreeBSD.org> AuthorDate: 2024-11-12 18:51:45 +0000 Commit: Jose Luis Duran <jlduran@FreeBSD.org> CommitDate: 2024-11-12 18:53:39 +0000 Revert "ipfilter: Avoid stopping with a lock held" The timeout function still tries to acquire the rwlock, and now it deadlocks, since the callout framework will have already acquired it. This reverts commit 1fa6daaafd74c1a457dcfe26e0a5943b5441dc9d. PR: 282478 Reported by: markj Approved by: emaste (mentor) sys/netpfil/ipfilter/netinet/ip_fil_freebsd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=650900cc2f607458d32d333bd7ab0aa10be13ba4 commit 650900cc2f607458d32d333bd7ab0aa10be13ba4 Author: Jose Luis Duran <jlduran@FreeBSD.org> AuthorDate: 2024-11-12 21:08:50 +0000 Commit: Jose Luis Duran <jlduran@FreeBSD.org> CommitDate: 2024-11-12 21:31:24 +0000 ipfilter: Avoid holding a lock while stopping Avoid calling _callout_stop_safe with a non-sleepable lock held when detaching by initializing callout_init_rw() with CALLOUT_SHAREDLOCK, and avoiding re-initialization inside the timer function. PR: 282478 Reviewed by: cy, emaste, jhb, markj Tested by: cy Approved by: emaste (mentor) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D47530 sys/netpfil/ipfilter/netinet/ip_fil_freebsd.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8b29ab3aff13366f1032ec28611db5200e3badf5 commit 8b29ab3aff13366f1032ec28611db5200e3badf5 Author: Jose Luis Duran <jlduran@FreeBSD.org> AuthorDate: 2024-11-02 17:58:59 +0000 Commit: Jose Luis Duran <jlduran@FreeBSD.org> CommitDate: 2024-11-19 02:32:59 +0000 ipfilter: Avoid holding a lock while stopping Avoid calling _callout_stop_safe with a non-sleepable lock held when detaching by initializing callout_init_rw() with CALLOUT_SHAREDLOCK, and avoiding re-initialization inside the timer function. PR: 282478 Reviewed by: cy, emaste, jhb, markj Tested by: cy Approved by: emaste (mentor) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D47530 (cherry picked from commit 1fa6daaafd74c1a457dcfe26e0a5943b5441dc9d) (cherry picked from commit a0618fbe19dfedcdf01b4c232fe6669ae19505c4) (cherry picked from commit 650900cc2f607458d32d333bd7ab0aa10be13ba4) sys/netpfil/ipfilter/netinet/ip_fil_freebsd.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5006b9b4223c9a6620ce6513cf5f14161bbafd81 commit 5006b9b4223c9a6620ce6513cf5f14161bbafd81 Author: Jose Luis Duran <jlduran@FreeBSD.org> AuthorDate: 2024-11-02 17:58:59 +0000 Commit: Jose Luis Duran <jlduran@FreeBSD.org> CommitDate: 2024-11-19 02:36:57 +0000 ipfilter: Avoid holding a lock while stopping Avoid calling _callout_stop_safe with a non-sleepable lock held when detaching by initializing callout_init_rw() with CALLOUT_SHAREDLOCK, and avoiding re-initialization inside the timer function. PR: 282478 Reviewed by: cy, emaste, jhb, markj Tested by: cy Approved by: emaste (mentor) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D47530 (cherry picked from commit 1fa6daaafd74c1a457dcfe26e0a5943b5441dc9d) (cherry picked from commit a0618fbe19dfedcdf01b4c232fe6669ae19505c4) (cherry picked from commit 650900cc2f607458d32d333bd7ab0aa10be13ba4) sys/netpfil/ipfilter/netinet/ip_fil_freebsd.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)