Running 13-STABLE (13.0-STABLE #7 stable/13-n246963-2cbe61a73d8: Thu Aug 26 18:35:19 CEST 2021 amd64) with net/asterisk18 (Asterisk 18.6.0) gives me headaches: asterisk 18.5.0 stayed running and up for weeks without issues, 18.6.0 now dies silently on unknown ocassions.
The system does restart mpd5 to force a defined obtain of an IP address in the night and in one ocassion I could link the restart of the network with the crash based on vanished log entries. In other cases I do not have any clue. Sometimes 18.6.0 last for a day or two, sometimes only hours - as of today, after less than 8 hours uptime, asterisk died.
(In reply to O. Hartmann from comment #0)
Hello! Have you found a reason? Experiencing same on 12.2 jail. Maximum verbose and debug gives me no clue.
(In reply to Oljas Kuzembaev from comment #1)
Well, I do not have an clue at all. But I thing there is an issue with the network. My router is utilizig mpd5 as pppoe service. To force a change of the IPv4 given by the provider, I restart the service mpd5 via cron at a specific time and as I wrote earlier in the PR, that is also in almost all cases the time when the last log entry of asterisk was registered.
This time I had to restart mpd5 due to some development on my dynDNS scripts and, voila!, asterisk raise CPU usage shortly after mpd5 has been restarted and crashes afterwards with a SIGNAL 11.
I use IPFw as one and only IP filtering system. Now I try to restart also IPFW with the mpd5 link-up script when IPS are going to change and see what's happening to asterisk.
By the way: my asterisk configuration uses the symbolic name (therefore dynDNS) and I also use a special setup of dns/bind916. Means: I resolve the ISP's SIP proxy/SIP server with the DNS given by the ISP and all others with "free and open" resolver. This kind of setup seemed to force some trouble in the past, although I can't be specific on that.
This is the crash scenario when I'm forcing the change of address. I have not investigated so far whether asterisk dies the same way when the ISP is changing the outbound addresses.
I'm taking this as maintainer.
Please in the title write the name of an actual port (net/asterisk16 or net/asterisk18 in this case) so bugzilla can guess whom to auto assign it to.
By what you tell me, my first guess is that asterisk is not handling IP changes on it's bound interfaces correctly, and crashing as a consequence. But it could be also other things.
O. Hartmann: you tell it crashes with signal 11, then it should leave a core dump behind. Can you grab it? even better, could you compile asterisk with "WITH_DEBUG=yes" defined and grab a core dump from there?
Without a core dump or at least a backtrace we can only guess and it's very difficult to find what the issue is.
Addendum: Port net/asterisk18 seems to crash (+pid 6661 (asterisk), jid 0, uid 931: exited on signal 11) whenever the outbound IP address in our config changes.
We/I have an private ISP based network connection, dual stack, IPv4 and IPv6 are configured, IPv6 is not used yet so far. IPv4 is via NAT. The local host running asterisk18 is named by its symbolic name, not IP, for the record.
The crash can be triggered by restarting manually mpd5 (service mpd5 restart) to obtain a new IP. We do this on purpose at night time to avoid a timed IP change by the ISP.
I haven't yet checked whether a crash occur also in case the forced restart of mpd5 isn't triggered, but the change of the assigned IPs is triggered by the ISP.
Forcing FreeBSD to restart asterisk 15 minutes after mpd5 has been changed and the dynDNS service is triggered to update the DNS entries also leads in some cases to a crash, but at the moment, we shifted that timeframe up to 1 1/2 hours after the resatart of mpd5 without a dead asterisk for now. This would, in my opinion, point to some problems with resolving asterisk's local (outbound) hostname?
Prior to asterisk 18.6.0 we could reset the outbound interface as we liked without any problems and asterisk kept up, but I have to confess we swapped the PPP connector from (limited and slow) ppp to mpd5 recently.
Hope this observation might give some hints what's wrong.
While such observations can help isolate the problem, it's difficult to find the exact spot in the code where the issue is happening from them. Especially for a complex software like asterisk. Even a person with intimate knowledge of the code base (which I don't possess) would have an hard time finding the problem in this way.
Signal 11 crashes usually leave behind a core dump, If the crashing binary is compiled with DEBUG symbols such core dump would identify the exact spot (or at least in which function) in the code where the crash is happening. This would make it possible for someone to have a look at it without wild guesses.
If you want this crash solved, please try with a debug build and grab the core file, it's almost impossible to guess correctly what is happening without that.
If you're running on a read only filesystem, the core dump cannot be written there, you should configure the system to save core dumps somewhere with write access (memory backed /tmp or /var for example), this can be done with the kern.corefile sysctl (check core(5))