If SCTP under IpV6 is used in server mode on FreeBSD and a client connects in[from an external device - I used linux/Ubuntu), there is a kernel panic and of course reboot of the FreeBSD host. Steps to reproduce: On FreeBSD host: ncat --sctp -l <IpV6 Address of localhost> <port #> On external machine such as Ubuntu: ncat --sctp <IpV6 address of FreeBSD host> <port # of listener on FreeBSD above> Instantly there is a panic on the FreeBSD host. Running the same in IpV4 mode or reversing the flow i.e. FreeBSD client to Linux server runs fine. This does not seem to be the ICMPV6 issue that recently showed up. Also, I have applied all patches via "freebsd-update fetch".
Can you provide the panic message and backtrace?
Created attachment 188676 [details] Crash info file
The stacktrace from the vmcore is: Fatal double fault: eip = 0xc06dcf21 esp = 0xeeafbfd8 ebp = 0xeeafc1b8 cpuid = 1; apic id = 02 panic: double fault cpuid = 1 KDB: stack backtrace: #0 0xc0bc1b3e at kdb_backtrace+0x4e #1 0xc0b8472e at vpanic+0x10e #2 0xc0b84614 at panic+0x14 #3 0xc10985d9 at dblfault_handler+0x99 Uptime: 1m27s Physical memory: 2019 MB Will that help?
Just have tested between two FreeBSD head machines using IPv6 link local addresses and ncat from the nmap port works fine. Will test Linux to FreeBSD 11 later today...
The issue seems to be if you use an actual IPV6 address i.e. not ::1. Well I could not connect from an external machine when listener was on ::1. So the listener should be listening on the local IPv6 address. And likely its some compatibility issue between Linux SCTP and BSD SCTP and so the issue shows up when the other machine is a Linux machine likely. I have seen this happen now about 10 times and it is immediate. It seems the connection occurs since the Linux machine shows as connected and then the FreeBSD machine reboots.
(In reply to Shreesh Holla from comment #5) Can you please state the arguments you use on the server side for starting ncat?
(In reply to Michael Tuexen from comment #6) Oh I mentioned in the bug report right in the beginning. Here it is again: ncat --sctp -l <IpV6 Address of FreeBSD host> <port #>
(In reply to Shreesh Holla from comment #7) Is it a global IPv6 address or a link local one? I have been testing with link local addresses...
@Michael, one other thing to consider is that Shreesh is running i386, which uses a smaller KSTACK_PAGES default (2) than amd64 (4). Double fault is consistent with overrunning the end of the stack. It's possible there is a large stack buffer somewhere in SCTP causing the crash. (Or maybe 2 pages just isn't enough anymore for i386 -- words and pointers are half size compared to amd64, but not buffers or fixed-width integers.) You might try bringing up an i386 VM and reproing there, if you aren't already. @Shreesh, Is your machine amd64-capable? If so, you might try to reproduce the issue on an amd64 installation.
I agree with Conrad. Please add to your /boot/loader.conf: kern.kstack_pages=4 Then reboot, try to reproduce the problem and report back. See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476
(In reply to Eugene Grosbein from comment #10) Yes. This did do the trick. Adding to loader.conf as Eugene suggested. Also @Michael - as a FYI - I am using global IPV6 addresses. That bug that Eugene actually seemed to say it very clearly too. I guess one comment I have is wrt i386 have page numbers as 2 @Conrad. Unless there is a good reason to keep that - would it not be better to sync it with amd64 number? Unless it is tuned down to accommodate lower end devices. But, if thats the case at least this scenario wrt SCTP may be unusable on such devices.
(In reply to Shreesh Holla from comment #11) Well, pointers and native words *are* smaller on i386, so it isn't totally unreasonable for the stack size to be smaller than amd64. Also, i386-only devices tend to be older and have smaller amounts of memory available. I think the long term solution is fixing the SCTP (and other) code to not use so much stack memory. However, bumping KSTACK_PAGES to maybe 3 by default on i386 could be reasonable. I don't remember the resolution of the last argument about KSTACK_PAGES on x86 :-).
There are no reasons to keep kstack_pages<4 for i386 with exception of very specific load pattern when you have enormous number of threads in the system.
(In reply to Conrad Meyer from comment #12) @conrad - I see what you mean since i386 => 32 bit. And yes definitely fixing the SCTP stack to not use that much stack is the right one. From what I saw it was right away and likely it uses a lot of stack for each stream and maybe each association. Dont know the implementation - but difficult to break that need. But as @Eugene said there is no requirement to keep it less than 4. My opinion is that the default is changed to 3 or 4. And systems that are lower end can configure for lower numbers knowing that things like IpV6+SCTP will not work. Assuming they dont need to use SCTP. This way the OS is generally usable in such a specific situation? I mean I think this is a security situation since it seems real easy to bring down a machine with this issue currently. Seems like a more critical issue than that ICMPV6 issue of a while back.
OK, so it is a problem in i386. Haven't tested that for ages... One problem we ran into in the past was that the stack grew due to inline compilation. Not sure what is the case here. I can try to nail that down. Will report back.
(In reply to Eugene Grosbein from comment #13) (In reply to Shreesh Holla from comment #14) Yeah, just bumping i386 KSTACK_PAGES will at least give parity with amd64. That's reasonable. It seems kind of absurd we knowingly ship a broken ZFS on i386 (see UPDATING's reference to KSTACK_PAGES). If people are concerned about a KSTACK_PAGES=2 architecture "keeping us honest," they are free to set KSTACK_PAGES=2 on their own amd64 systems' configuration, rather than inflicting the pain upon i386 users.
(In reply to Conrad Meyer from comment #16) I agree with @eugene. The generally accepted case should just work. And special configurations are well special configurations and they can tune accordingly. @michael - so possibly a inlining issue. Ugh! Anyway - will await what you find. The interesting thing I thought was this seems to be an issue only at connection time. I did some tests here and exchanged ~1MB of real data and no issues. But thats with the new KSTACK_PAGES setting
A commit references this bug: Author: cem Date: Mon Dec 11 04:32:37 UTC 2017 New revision: 326758 URL: https://svnweb.freebsd.org/changeset/base/326758 Log: i386: Bump KSTACK_PAGES default to match amd64 Logically, extend r286288 to cover all threads, by default. The world has largely moved on from i386. Most FreeBSD users and developers test on amd64 hardware. For better or worse, we have written a non-trivial amount of kernel code that relies on stacks larger than 8 kB, and it "just works" on amd64, so there has been little incentive to shrink it. amd64 had its KSTACK_PAGES bumped to 4 back in Peter's initial AMD64 commit, r114349, in 2003. Since that time, i386 has limped along on a stack half the size. We've even observed the stack overflows years ago, but neglected to fix the issue; see the 20121223 and 20150728 entries in UPDATING. If anyone is concerned with this change, I suggest they configure their AMD64 kernels with KSTACK_PAGES 2 and fix the fallout there first. Eugene has identified a list of high stack usage functions in the first PR below. PR: 219476, 224218 Reported by: eugen@, Shreesh Holla <hshreesh AT yahoo.com> Relnotes: maybe Sponsored by: Dell EMC Isilon Changes: head/sys/i386/conf/NOTES head/sys/i386/include/param.h
OK, I setup a Release 11.1 i386 VM and a HEAD i386 VM and installed nmap. Using ncat --sctp -l IPv6_address on these VMs I can NOT reproduce the issue. I used ::1 and a client on the same VM, used a global IPv6 address and a remote FreeBSD head and a remote Linux (recent Ubuntu) system and I can establish the SCTP association, transfer data in both directions and tear the association down. I verified this using tcpdump. There must be another parameter in the game...
(In reply to Michael Tuexen from comment #19) That's strange. That's exactly what I did for the panic. So maybe there is something else on my systems. I am using vm's too of vmware desktop. I will try to revert my changes later today and see if I can figure out what else is involved
I'm using VMWare Fusion. The VM uses a single processor core and 256MB of RAM. I installed FreeBSD from FreeBSD-11.1-RELEASE-i386-dvd1.iso. Let me know what you find...
(In reply to Michael Tuexen from comment #21) Here is what I may have which just be causing it. I backed out my change from loader.conf and it went into panic immediately when I made an sctp connection. I do have Vmware tools installed on both. Also running in GUI mode i.e. FreeBSD host is running gnome. Apart from that cant see any major differences. If you are still unable to reproduce, then I will re-try with a fresh install of both FreeBSD and Ubuntu
(In reply to Shreesh Holla from comment #22) I tried to install gnome3 and xorg according to the FreeBSD handbook, but gnome doesn't start. It complains about not finding the default font "fixed". I also failed to install the vmware tools, since when trying to follow the instructions given on the VMWare Website, I end up in errors. The instructions for FreeBSD as a GuestOS are also pretty old. Not sure if they still support FreeBSD. As you guess, I'm neither using the GUI on the VMs normally nor I'm using the vmware tools... So I can't test your setup, I'm sorry.
OK, I'll reduce the stack usage...
Closing since the default stack size was increased on i386. The two major offenders in SCTP, sctp_auth_get_cookie_params() and sctp_load_addresses_from_init() are still there. They both allocate 3 512-byte buffers on the stack. I can't see an easy way to fix that; all three buffers are used to temporarily store data until we know the combined size of the data, at which point a buffer to store all of it is allocated. It might be possible to avoid the temporary buffers by using m_pulldown() to ensure that the parameter headers are contiguous, and then use m_copydata() to copy data into the key buffer once we know the combined length. This is a bit tricky to get right and I have no setup to test such a change. However, it would shave 1536 bytes off the stack frame and avoid some extra copying.
(In reply to Mark Johnston from comment #25) Increasing the stack size is a workaround. The plan was to rewrite the handling such that only one buffer is needed. That is why I left the bug open. Since it is closed now, such an optimisation does not seem to be wanted anymore.
(In reply to Michael Tuexen from comment #26) Sorry. I closed the bug only because the submitter's original problem was resolved and I was going through some old bug reports. Since you plan to work on the bug I reopened the PR.
(In reply to Mark Johnston from comment #27) OK, great. I think reducing the stack space worth the effort. It is not that hard, will improve also the handling of pathological parameter configurations. I haven't done this yet, because I also must extend packetdrill to have a way to test all the strange disallowed combinations. And packetdrill is still missing support for the Authentication extension...