Bug 221271 - net/asterisk13: rc script fails on "restart" in some occasions
Summary: net/asterisk13: rc script fails on "restart" in some occasions
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Guido Falsi
URL:
Keywords: needs-patch, needs-qa
Depends on:
Blocks:
 
Reported: 2017-08-06 08:27 UTC by O. Hartmann
Modified: 2017-08-28 09:33 UTC (History)
0 users

See Also:
madpilot: maintainer-feedback+
madpilot: merge-quarterly+


Attachments
Asterisk rc.script fixes (1.49 KB, patch)
2017-08-07 10:50 UTC, Guido Falsi
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description O. Hartmann 2017-08-06 08:27:11 UTC
While running net/asterisk13, the rc-script fails to do "restart" properly in cases where the asterisk daemon is still in progress to startup. It seems the rc script waits, then loose the track on the asterisk13's PID and then fails to restart, stop and properly start again. While this, a sterisk daemon is still running, only to be killed by a kill -9.

How to repeat:

Start asterisk via "service asterisk onestart|start". Immediately, when the carret is back on the console awaiting new commands, type "service asterisk onerestart|restart". While the prior asterisk daemon is still in progress of a startup, the shell is intermediately back then and awaits new commands and it is possible to restart while the asterisk daemon hasn't finished to start in the first place.

This leads to a kind of zombie asterisk which can not be stoped or restarted via the rc system, only by kill -9/kill -KILL.
Comment 1 Guido Falsi freebsd_committer 2017-08-06 09:49:35 UTC
Hi,

Thanks for the report.

I've not been completely able to reproduce this. But I got some indication about what is happening.

 I've got this output:

# service asterisk restart
Stopping asterisk
No such command 'core stop now' (type 'core show help core' for other possible commands)
asterisk already running?  (pid=11267).

The stop rc command is implemented as connecting to asterisk via it's command line and sending it the command to shutdown "core stop now". There's a race condition because asterisk takes a perceptible time to start and load all it's modules, and it does accept control connections, but is not ready to process commands there.

While stopping asterisk with a signal could be done it's unsafe in certain configurations (for example if using database backends).

A possible workaround could be adding a short, maybe optional and configurable,  sleep in the stop and reload commands, before connecting to the asterisk daemon. This wouldn't fix it in every case, since we cannot know how long the asterisk startup will take.

Such sleep would be disabled by default though, since it is not necessary in most configurations.

Do you think this could be acceptable?

I'll work out a patch for you to test.
Comment 2 Guido Falsi freebsd_committer 2017-08-07 10:50:36 UTC
Created attachment 185125 [details]
Asterisk rc.script fixes

Please test this patch.

I have made the script more resilient by copying logic from standard rc.subr scripts.

I also added a new variable "asterisk_stopsleep" which can be set to a number of seconds to wait before actually sending the stop command to the asterisk console. If necessary you can set this to a time longer than your asterisk configuration startup time.

Please report back, so I can commit this if it actually solves the issue.

Thanks!
Comment 3 commit-hook freebsd_committer 2017-08-22 13:00:10 UTC
A commit references this bug:

Author: madpilot
Date: Tue Aug 22 12:59:54 UTC 2017
New revision: 448520
URL: https://svnweb.freebsd.org/changeset/ports/448520

Log:
  Make the provided rc script more robust.

  Also add an asterisk_stopsleep knob (disabled by default) to allow
  users to work around a possible race condition when asterisk is sent
  a stop command just after launching, but before it's startup is
  actually completed.

  PR:		221271
  Submitted by:	O. Hartmann <ohartmann@walstatt.org>
  MFH:		2017Q3

Changes:
  head/net/asterisk11/Makefile
  head/net/asterisk11/files/asterisk.in
  head/net/asterisk13/Makefile
  head/net/asterisk13/files/asterisk.in
Comment 4 O. Hartmann 2017-08-22 18:58:49 UTC
Sorry for the late response.

Thank you very much for taking care of this! I have no objection to the pacth - how should or could I?

I was sneaking through the startup process of Asterisk itself, looking for some data written to a pipe/socket/file which could be used for indication of still being in the progress of starting up, but it is not trivial to me to figure that out.

Waiting a certain time is a good solution. But it needs to be adjusted, since on a ARM based platform, the time is obvious larger than on a brand new, upcoming 18/36 core/threads Skylake-SP or even a 16/32 core/threads AMD Threadrippe ;-)

On the other hand, I'm fighting with a serious memory leak in Asterisk 13 ... But I'll open another PR for this.

So far, again, thanks a lot.

Oliver Hartmann
Comment 5 Guido Falsi freebsd_committer 2017-08-22 19:11:50 UTC
(In reply to O. Hartmann from comment #4)
> Sorry for the late response.
> 
> Thank you very much for taking care of this! I have no objection to the
> pacth - how should or could I?
> 
> I was sneaking through the startup process of Asterisk itself, looking for
> some data written to a pipe/socket/file which could be used for indication
> of still being in the progress of starting up, but it is not trivial to me
> to figure that out.
> 
> Waiting a certain time is a good solution. But it needs to be adjusted,
> since on a ARM based platform, the time is obvious larger than on a brand
> new, upcoming 18/36 core/threads Skylake-SP or even a 16/32 core/threads AMD
> Threadrippe ;-)
> 

This is the best I could do. Anyway stopping asterisk just a few seconds after starting it should be a rare event.

You can cut on the startup time by disabling modules you're not using, via the modules.conf configuration file. In a minimal test I made I was able to make it start in less than a few seconds.

> On the other hand, I'm fighting with a serious memory leak in Asterisk 13
> ... But I'll open another PR for this.

Memory leaks are difficult to track down. You should build a minimal and simple use case reproducing the bug. I could test that too and see if it is reproducible.

Upstream supports only i386/amd64 linux officially. While they accept the FreeBSD specific patches I send them from time to time, that's not a priority to them. I suspect arm support is even a lower priority to them.

P.S. I'll close this PR once I have merge to 2017Q3 branch.
Comment 6 commit-hook freebsd_committer 2017-08-27 20:36:16 UTC
A commit references this bug:

Author: madpilot
Date: Sun Aug 27 20:35:39 UTC 2017
New revision: 448842
URL: https://svnweb.freebsd.org/changeset/ports/448842

Log:
  MFH: r448520

  Make the provided rc script more robust.

  Also add an asterisk_stopsleep knob (disabled by default) to allow
  users to work around a possible race condition when asterisk is sent
  a stop command just after launching, but before it's startup is
  actually completed.

  PR:		221271
  Submitted by:	O. Hartmann <ohartmann@walstatt.org>

  Approved by:	ports-secteam (delphij)

Changes:
_U  branches/2017Q3/
  branches/2017Q3/net/asterisk11/Makefile
  branches/2017Q3/net/asterisk11/files/asterisk.in
  branches/2017Q3/net/asterisk13/Makefile
  branches/2017Q3/net/asterisk13/files/asterisk.in
Comment 7 Guido Falsi freebsd_committer 2017-08-28 09:33:52 UTC
Committed and merged.