Bug 268086 - spin lock held too long in icmp6_rip6_input
Summary: spin lock held too long in icmp6_rip6_input
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2022-11-30 17:58 UTC by Kajetan Staszkiewicz
Modified: 2022-12-07 07:16 UTC (History)
3 users (show)

See Also:


Attachments
kgdb output (9.75 KB, text/plain)
2022-11-30 17:58 UTC, Kajetan Staszkiewicz
no flags Details
A kernel panic on a system with enabled debugging options. (4.45 KB, text/plain)
2022-12-05 15:34 UTC, Kajetan Staszkiewicz
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kajetan Staszkiewicz 2022-11-30 17:58:52 UTC
Created attachment 238458 [details]
kgdb output

Hello,

I had this kernel crash on 2 of my routers running a GENERIC 13.1-RELEASE-p3 kernel. This is probably related to adding some new tunnels, starting BGP sessions with BIRD and adding some routes. A few hours after this change the both routers crashed just minutes apart with identical stack trace. Once the change was undone, the routers operate stable for days.

I understand that icmp6_rip6_input's job is to deliver ICMP6 packets to open sockets. In case of my routers the raw sockets would be held by radvd, fping (from smokeping) and some python program I use as a smokeping replacement.

Looking at the mbuf at frames 33 icmp6_rip6_input, 34 icmp6_input, 35 ip6_input I can see ICMPv6 Neighbor Solicitation messages.

Please find attached kgdb bt output. I have the whole memory dump and I can get other data from it if necessary.
Comment 1 Kajetan Staszkiewicz 2022-12-05 15:34:55 UTC
Created attachment 238543 [details]
A kernel panic on a system with enabled debugging options.
Comment 2 Kajetan Staszkiewicz 2022-12-05 15:36:52 UTC
Hello,

I've compiled a kernel with lock debugging enabled and got a different panic this time with a message "panic: thread 0xfffffe018d823020 still in interruptible sleep?" generated by KASSERT in sleepq_check_signals in sys/kern/subr_sleepqueue.c

Please find another kgdb output attached.
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2022-12-06 16:08:42 UTC
Triage (tentative): freedbsd-net@

To the reporter: is the different panic in comment #0 reproducible, on the same computers, with an up-to-date OS?
Comment 4 Kajetan Staszkiewicz 2022-12-06 17:16:33 UTC
The crash happened on the following versions of kernel:
- 2x 13.1-RELEASE-p3 GENERIC 
- 1x 13.1-RELEASE-p4 custom kernel with KASSERT enabled

As far as I can understand the stack traces, the issue always happens in a thread operating on a raw ICMPv6 socket (radvd in one case, a python-based ping tool in another).