Created attachment 221859 [details] Backtrace I am encountering a kernel panic with kernel: FreeBSD 13.0-ALPHA2 stable/13-c256207-gd1c39af0ec3 It does not happen with kernel before: bc7ee8e5bc555c246bad8bbb9cdf964fa0a08f41
Used sysctls: kern.ipc.maxsockbuf=33554432 net.inet.tcp.sendbuf_inc=32768 net.inet.tcp.cc.algorithm="htcp" net.inet.tcp.cc.htcp.rtt_scaling=0 net.inet.tcp.cc.htcp.adaptive_backoff=1 # Required for proper PF operation kern.timecounter.hardware="HPET" # Kernel TLS kern.ipc.mb_use_ext_pgs=1 kern.ipc.tls.ifnet.permitted=1 kern.ipc.tls.enable=1
Assigning this to Richard, since he authored the patch. Leaving the bug for him. Initial question: do you have a way to reproduce this? Are there steps to follow to recreate this locally?
This is somewhat difficult, we are not able to reproduce it artificially on our test system, we need "live" traffic on our 100Gbit link, but very little streaming traffic from various systems (Linux, Windows, Android, ...) is enough to trigger it. The panic happens somewhere between 3-30 minutes after the system with the "faulty" kernel gets online with only few hundred megabit to single-digit gigabit traffic. What is a pity that we have not set up a large-enough swap partition for a kernel dump so I just have the backtrace from a screenshot of a debug-enabled kernel. If there is anything else how I can help, may we could redesign the boot disk to have a larger partition to be able to store a kernel dump.
I'm not sure how to end up with a PRR_partialack, without recover_fs being initialized. Potentially with ACK reordering (unlikely), or spurious RTO rollback (where TF_FASTRECOVERY may be set, but recover_fs was cleared already. Do you observe non-zero "data packets unnecessarily retransmitted" in the output of netstat -snp tcp? https://reviews.freebsd.org/D28326 has a patch to fix the div/0 in that section of code, although knowing explicitly the system ends up going there would be interesting. If the frequence of the above counter incrementing (and no more panics with that patch) matches the typical runtime when it happend, it's like to have to do with RTO rollbacks.
Martin, are you able to test whether D28326 fixes the issue? If you can't, that is fine, but if you can it would be great to know if this fixes the issue.
https://cgit.FreeBSD.org/src/commit/?id=6a376af0cd212be4e16d013d35a0e2eec1dbb8ae should fix this issue.
https://cgit.FreeBSD.org/src/commit/?id=76dd854f47f4aea703093647a158f280d383ea6d fixes it in stable/13. Therefore closing the issue.