Bug 224505

Summary: net/asterisk13: Unable to chown run directory to 931 (asterisk) (/var/run on tmpfs)
Product: Ports & Packages Reporter: O. Hartmann <ohartmann>
Component: Individual Port(s)Assignee: Guido Falsi <madpilot>
Status: Closed Overcome By Events    
Severity: Affects Only Me Flags: madpilot: maintainer-feedback+
Priority: ---    
Version: Latest   
Hardware: Any   
OS: Any   

Description O. Hartmann 2017-12-21 19:56:53 UTC
Asterisk, most recent from the port, doesn't start properly when /var/run/asterisk resides on TMPFS - this is in our environment the case.

While booting, I watch this message appearing:

[...]
Starting asterisk.
Unable to chown run directory to 931 (asterisk)
meuh
prcmd

and ll /var/run gives

[...]
total 64
drwxr-xr-x  2 root        asterisk       0 Dec 16 16:26 asterisk/
[..]

The daemon asterisk is running, as one could easily proof via top, but trying to connect to asterisk via 

root@gate:~ # asterisk -rv
Unable to connect to remote asterisk (does /var/run/asterisk/asterisk.ctl exist?)


fails in the shown error.
Comment 1 Guido Falsi freebsd_committer freebsd_triage 2017-12-21 21:42:00 UTC
*** Bug 224506 has been marked as a duplicate of this bug. ***
Comment 2 Guido Falsi freebsd_committer freebsd_triage 2017-12-21 22:03:22 UTC
Hi,

Asterisk uses that directory for it's pid file and the control socket, needed for remote connections. Otherwise it can actually perform it's job. This explains why this does not stop it from starting anyway.

The failing command is run as root before actually starting asterisk and dropping privileges(which is done by asterisk itself after startup). At least on a standard setup.

I also do have a 11.1 system running asterisk with /var on tmpfs and am not seeing such behaviour, so I need more information.

Do you have any special configurations which could interfere with this?

In your setup, what is the actual tmpfs mountpoint? Only /var/run is tmpfs mounted or all of /var?
if all of /var is mounted tmpfs it has to be populated from somewhere, usually using the /etc/rc.initdiskless script.

With default config that script takes it's data from /conf/base/var.

So could you check in /conf/base/var/run? a full "ls -al /conf/base/var/run" could help.

Otherwise, disabling asterisk, rebooting the machine and sending me the output of "ls -al /var/run" would be useful to know thee pristine condition of the parent directory, before the asterisk startup script does anything.


Thanks
Comment 3 O. Hartmann 2017-12-26 19:39:48 UTC
I'm aware of what asterisk is placing in /var/run and that troubles me since it can not establish its controlling socket and PID file there which are usually:

total 4
srwxr-xr-x  1 asterisk  asterisk  0 Dec 26 18:55 asterisk.ctl=
-rw-r--r--  1 asterisk  asterisk  6 Dec 26 18:55 asterisk.pid

Not having the controlling socket - asterisk.ctl - leaves me without the ability to join the daemon via asterisk -rv. Not having the PID file leaves me with no control via rc scripts.

Since NanoBSD created /var as tmpfs as a whole, I guess there is nothing special:

/dev/gpt/dsks1a on / (ufs, local, read-only)
devfs on /dev (devfs, local, multilabel)
tmpfs on /etc (tmpfs, local)
tmpfs on /var (tmpfs, local)
map auto_pool on /pool (autofs)

Apart from that, I also use some autofs mounted filesystems for asterisk:

/dev/gpt/asterisk on /pool/asterisk (ufs, local, journaled soft-updates, multilabel, automounted)
/dev/gpt/logging on /pool/logging (ufs, local, journaled soft-updates, multilabel, automounted)

But even if I add REQUIRE: automount to the rc-script, the result is the same!

This is /var/run without asterisk running:

-rw-------  1 root        wheel          5 Dec 26 19:20 automountd.pid
-rw-------  1 root        wheel          5 Dec 26 19:22 autounmountd.pid
-rw-r--r--  1 root        wheel          0 Dec 26 19:20 clean_var
-rw-------  1 root        wheel          4 Dec 26 19:22 cron.pid
-rw-r--r--  1 root        wheel          6 Dec 26 19:20 ddclient.pid
drwxr-xr-x  2 dhcpd       dhcpd         64 Dec 26 19:20 dhcpd/
-rw-r--r--  1 root        wheel       9151 Dec 26 19:20 dmesg.boot
drwxr-xr-x  2 root        wheel          0 Dec 26 17:55 fail2ban/
-r--r--r--  1 root        wheel        247 Dec 26 19:20 ld-elf.so.hints
-r--r--r--  1 root        wheel        129 Dec 26 19:20 ld-elf32.so.hints
drwxr-xr-x  2 root        wheel          0 Dec 26 17:54 libuuid/
srw-rw-rw-  1 root        wheel          0 Dec 26 19:20 log=
srw-------  1 root        wheel          0 Dec 26 19:20 logpriv=
drwxr-xr-x  2 bind        bind         128 Dec 26 19:20 named/
-rw-r--r--  1 root        wheel          5 Dec 26 19:20 ntpd.pid
-rw-------  1 root        wheel          5 Dec 26 19:22 powerd.pid
drwxr-xr-x  2 root        wheel          0 Dec 26 17:52 ppp/
drwxr-xr-x  2 freeradius  freeradius     0 Dec 26 17:55 radiusd/
-r--r--r--  1 root        wheel          0 Dec 26 19:20 rpcbind.lock
srw-rw-rw-  1 root        wheel          0 Dec 26 19:20 rpcbind.sock=
-rw-------  1 root        wheel          6 Dec 26 19:20 smartd.pid
-rw-r--r--  1 root        wheel          5 Dec 26 19:20 sshd.pid
drwx--x--x  2 root        wheel          0 Nov 24 12:11 sudo/
-rw-------  1 root        wheel          5 Dec 26 19:22 syslog.pid
-rw-r--r--  1 root        wheel          0 Dec 26 19:20 syslogd.sockets
drwx------  2 root        _tss           0 Nov 23 11:12 tpm/
-rw-r--r--  1 root        wheel          5 Dec 26 19:20 tun0.pid
-rw-r--r--  1 root        wheel        394 Dec 26 19:22 utx.active
drwxr-xr-x  2 root        wheel          0 Dec 26 17:52 wpa_supplicant/


I also set debugging on in rc.conf and I found this one confusing:

/etc/rc: DEBUG: checkyesno: asterisk_enable is set to YES.
/etc/rc: DEBUG: run_rc_command: start_precmd: asterisk_precmd 
/etc/rc: DEBUG: checkyesno: rc_startmsgs is set to YES.
Starting asterisk.
/etc/rc: DEBUG: run_rc_command: doit:  limits -C daemon su -m asterisk -c 'sh -c "/usr/local/sbin/asterisk  -n -F -U asterisk"'
Unable to chown run directory to 931 (asterisk)
meuh
Comment 4 O. Hartmann 2017-12-26 19:44:23 UTC
(In reply to Guido Falsi from comment #2)
/conf/base/var/run:

drwxr-xr-x  2 root        asterisk    512 Dec 25 01:51 asterisk/
drwxr-xr-x  2 root        wheel       512 Dec 26 17:55 fail2ban/
-r--r--r--  2 root        wheel       233 Dec 26 17:55 ld-elf.so.hints
drwxr-xr-x  2 root        wheel       512 Dec 26 17:54 libuuid/
drwxr-xr-x  2 root        wheel       512 Dec 26 17:52 ppp/
drwxr-xr-x  2 freeradius  freeradius  512 Dec 26 17:55 radiusd/
drwx--x--x  2 root        wheel       512 Nov 24 12:11 sudo/
drwx------  2 root        _tss        512 Nov 23 11:12 tpm/
drwxr-xr-x  2 root        wheel       512 Dec 26 17:52 wpa_supplicant/
Comment 5 O. Hartmann 2017-12-26 20:17:52 UTC
I checked for the /conf/base/var/asterisk settings and set the folder herein to ownership 931:931 (asteriks:asterisk).

After rebooting, asterisk restarted properly.

Taken this as the phenomenology of the problem, the reason and source is unknown. When ports get installed in the NanoBSD environment, nanobsd script, namely defaults.sh, copy the entire var folder to conf/base/var. So I assume the settings of the initial DESTDIR/var/run/asterisk folder is wrong due to the installation script of the port.
Comment 6 Guido Falsi freebsd_committer freebsd_triage 2017-12-26 21:31:00 UTC
(In reply to O. Hartmann from comment #5)
> I checked for the /conf/base/var/asterisk settings and set the folder herein
> to ownership 931:931 (asteriks:asterisk).
> 
> After rebooting, asterisk restarted properly.
> 
> Taken this as the phenomenology of the problem, the reason and source is
> unknown. When ports get installed in the NanoBSD environment, nanobsd
> script, namely defaults.sh, copy the entire var folder to conf/base/var. So
> I assume the settings of the initial DESTDIR/var/run/asterisk folder is
> wrong due to the installation script of the port.

What do you mean exactly by "the installation script"? The asterisk port comes with no pkg-install scripts. It has a pkg-plist, which is converted to a manifest in the ppackage file and then interpreted by the pkg tool at installation time. That's all that is done when the port is installed. No shell or other language script is launched.

I have checked with a NanoBSD image with asterisk I do have and there the directory has been created correctly.

The asterisk port plist has all needed directives to set the asterisk:asterisk ownership on the file.

You should check if the bug repeats itself with a clean nanobsd image build, first of all.

If it does not repeat Maybe you just had some old directory with wrong permissions lying around, or some botched build left it laying there.

If it repeats regularly maybe there is some issue but I need you to investigate exactly how and when.

By the way, do you build the nanobsd images as root? I don't think building those as a user is actually supported, and could explain why the file ended up with wrong ownership, since normal users can't change ownership of a file to another user, but they can change group ownership.
Comment 7 O. Hartmann 2017-12-27 13:43:34 UTC
On what version of FreeBSD did you do the tests?

I'm with CURRENT (revision 327237).

NanoBSD builds are done as root.

I did a clean build and the problem persists.

The installation scripts (NanoBSD config files in strictu sensu) hasn't been changed for months.

The difference in my NanoBSD config compared to the regular one is that I install via pkg from a remote repository, that has been created via poudriere. I'll check on that when ready to perform a dry run of pkg install.
Comment 8 Guido Falsi freebsd_committer freebsd_triage 2017-12-27 15:00:37 UTC
(In reply to O. Hartmann from comment #7)
> On what version of FreeBSD did you do the tests?
> 
> I'm with CURRENT (revision 327237).

My nanobsd imaages are based on 11.1-RELEASE. But I don't think that's a factor here. It's a simple file ownership thing.

> 
> NanoBSD builds are done as root.
> 
> I did a clean build and the problem persists.
> 
> The installation scripts (NanoBSD config files in strictu sensu) hasn't been
> changed for months.

Neither the asterisk port pkg plist, nor the pkg software logic AFAIK.

My objective is to understand if this is a general problem or specific to your setup. Since we're talking about custom nanobsd images, I tend to think it's not a general problem with the port but something specific to your setup.

> 
> The difference in my NanoBSD config compared to the regular one is that I
> install via pkg from a remote repository, that has been created via
> poudriere. I'll check on that when ready to perform a dry run of pkg install.

That's where I'd investigate. I also use a customization to the nanobsd scripts to install packages.

The asterisk13 port has a directive in it's pkg-plist to create /var/run/asterisk, setting the permissions depending on a variable in the port Makefile, defaulting to asterisk:asterisk. That works fine usually.

Depending on how pkg is invoked and the filesystem state maybe it has problems chowning the file at the installation time.
I'd try to understand at what exact point of the nanobsd build the directory is created and if it's created with correct permissions at that time (you can manually interrupt a clean build right after package installation, or simply add an exit command at the appropriate point in the scripts).

After that if at that time it's correct check there is no other script doing a chown, maybe some recursive one(I'm saying that because I've been bitten by this a few times).

It is difficult to guess more without looking at your nanobsd customizations. Could you share the custom function or script used to install the packages in the image being built?
Comment 9 O. Hartmann 2017-12-28 11:43:38 UTC
Essentially, the way we install packages, is given by these lines in the configuration file:

        env ABI=${ABI} pkg -4 -c ${NANO_WORLDDIR} install -fRy ${PACKAGES}
        # Cleanup
        #pkg -4 -c ${NANO_WORLDDIR} -vv
        pkg -4 -c ${NANO_WORLDDIR} autoremove -y
        pkg -4 -c ${NANO_WORLDDIR} clean

        # clean cache, saving space
        rm -rvf ${NANO_WORLDDIR}/var/cache/pkg
        # removing the package database
        rm -rvf ${NANO_WORLDDIR}/var/db/pkg

Nothing more. The repo backend is a poudriere made repo.

This worked now for almost two years without changes and out of the blue it doesn't.

When looking at the "original" folder structure that gets copied by the NanoBSD script into the diskless destination /conf/base/var/run, I face this:

[...]
[run]: ll
total 26
657497 drwxr-xr-x  10 root  wheel    uarch   11B 27 Dez. 22:42 ./
657465 drwxr-xr-x  26 root  wheel    uarch   26B 27 Dez. 22:42 ../
632208 drwxr-xr-x   2 root  931      uarch    2B 25 Dez. 01:51 asterisk/
634818 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:42 fail2ban/
634318 -r--r--r--   2 root  wheel    uarch  233B 27 Dez. 22:42 ld-elf.so.hints
677986 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:42 libuuid/
657498 drwxrwx---   2 root  network  uarch    2B 27 Dez. 22:40 ppp/
634017 drwxr-xr-x   2 133   133      uarch    2B 27 Dez. 22:42 radiusd/
632865 drwx--x--x   2 root  wheel    uarch    2B 24 Nov. 12:11 sudo/
664383 drwx------   2 root  _tss     uarch    2B 23 Nov. 11:12 tpm/
657499 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:40 wpa_supplicant/

FreeRADIUS is another port coming from the very same poudriere source and is installed the very same way - also for two years for now. And obviously it does get its correct folder ownership. Strange, isn't it?
Comment 10 Guido Falsi freebsd_committer freebsd_triage 2017-12-28 14:22:18 UTC
(In reply to O. Hartmann from comment #9)
> 
> This worked now for almost two years without changes and out of the blue it
> doesn't.

Ok, but what could have changed in the asterisk port which could break it?

I have made very little changes to the asterisk port in the last year, and very few modifications to the plist.

> 
> When looking at the "original" folder structure that gets copied by the
> NanoBSD script into the diskless destination /conf/base/var/run, I face this:
> 
> [...]
> [run]: ll
> total 26
> 657497 drwxr-xr-x  10 root  wheel    uarch   11B 27 Dez. 22:42 ./
> 657465 drwxr-xr-x  26 root  wheel    uarch   26B 27 Dez. 22:42 ../
> 632208 drwxr-xr-x   2 root  931      uarch    2B 25 Dez. 01:51 asterisk/
> 634818 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:42 fail2ban/
> 634318 -r--r--r--   2 root  wheel    uarch  233B 27 Dez. 22:42
> ld-elf.so.hints
> 677986 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:42 libuuid/
> 657498 drwxrwx---   2 root  network  uarch    2B 27 Dez. 22:40 ppp/
> 634017 drwxr-xr-x   2 133   133      uarch    2B 27 Dez. 22:42 radiusd/
> 632865 drwx--x--x   2 root  wheel    uarch    2B 24 Nov. 12:11 sudo/
> 664383 drwx------   2 root  _tss     uarch    2B 23 Nov. 11:12 tpm/
> 657499 drwxr-xr-x   2 root  wheel    uarch    2B 27 Dez. 22:40
> wpa_supplicant/
> 
> FreeRADIUS is another port coming from the very same poudriere source and is
> installed the very same way - also for two years for now. And obviously it
> does get its correct folder ownership. Strange, isn't it?

First of all, a quick look at the Freeradius port shows it's doing things very differently from the asterisk port. This implies that some unrelated change could have broken one and not the other.

The Freeradius port is using an actual post-install script to change the ownership. Problem is nothing in that part of the asterisk port changed, so why is it broken for you now? Before changing the port I need to understand exactly what is happening.

Asterisk also creates other directories in /var, could you check the ownership of /var/log/asterisk and /var/spool/asterisk?

Are you able to narrow down when this started to not work again

It could be something is changed in the pkg program. I'll analyse that.
Comment 11 Guido Falsi freebsd_committer freebsd_triage 2017-12-28 16:07:05 UTC
(In reply to O. Hartmann from comment #9)
> When looking at the "original" folder structure that gets copied by the
> NanoBSD script into the diskless destination /conf/base/var/run, I face this:
> 

by the way, could you also report "ls -al" of the ${NANO_OBJ}/_.w/var/run directory?

That should be where nanobsd scripts are grabbing things...

What I can't understand is if and why pkg is not setting correct permissions on the asterisk directory, since everything in the port is setup for that to happen.
Comment 12 Guido Falsi freebsd_committer freebsd_triage 2018-01-18 22:31:18 UTC
Are you able to provide some further information?

Is the problem happening still or removing the faulty directory made it disappear?

If the problem can't be reproduced anymore I'd rather close this bug report stating "unable to reproduce".

Thanks.
Comment 13 O. Hartmann 2018-01-19 18:40:56 UTC
The problem still exist and I can reproduce it on any NanoBSD we use. Since those NanoBSD environments do not have any speciality and they also install other ports, like FreeRADIUS, which installs also some folder with special ownership (freeradius:freeradius), I consider this this problem asterisk-specific.
Comment 14 Guido Falsi freebsd_committer freebsd_triage 2018-01-20 09:39:39 UTC
(In reply to O. Hartmann from comment #13)
> The problem still exist and I can reproduce it on any NanoBSD we use. Since
> those NanoBSD environments do not have any speciality and they also install
> other ports, like FreeRADIUS, which installs also some folder with special
> ownership (freeradius:freeradius), I consider this this problem
> asterisk-specific.

I understand and have no objection, but unluckily with the information provided I am unable to identify the actual cause or a course of action to fix it, considering that other use cases don't show the problem.

Since you're using customization scripts to create the NanoBSD image, it is possible some of it is triggering the issue. I don't mean the scripts are wrong, just that the exact sequence you're using is exposing a problem, but without a precise analysis of the scripts I cannot understand what is happening.

I'm also generating NanoBSD images with the asterisk port and not seeing the issue, most probably because my customization scripts are doing things differently (not necessarily more correctly, just differently)

Could you provide the information I asked for in comments 10 and 11?

One difference between my scripts and yours is I'm using the NanoBSD provided CR macro provided here:

https://github.com/freebsd/freebsd/blob/f95c99edc61467fb8b955a7823640bbb7204c5f7/tools/tools/nanobsd/defaults.sh#L285

to call pkg, like this:

CR env REPOS_DIR="/tpkgs" SIGNATURE_TYPE=none pkg install -y ${MY_PORTS}

(I'm taking then from a filesystem instead of a remote server but that should nt make a difference)

You're using the pkg -c option. While it should be the same it could make a difference.

As a band aid you can obviously add a chown line after installing the packages to fix the file permissions before creating the image.
Comment 15 O. Hartmann 2018-01-21 10:14:31 UTC
(In reply to Guido Falsi from comment #10)

> Asterisk also creates other directories in /var, could you check the ownership 
> of /var/log/asterisk and /var/spool/asterisk?

Of course:

#: ls -l _.w/var/log
total 8
drwxr-xr-x  5 root  931      5 14 Jan. 11:50 asterisk
drwx------  2 133   133      2 14 Jan. 13:18 radacct
-rw-------  2 133   133      0 21 Jan. 09:31 radius.log
-rw-------  2 133   133      0 21 Jan. 09:31 radutmp
-rw-------  2 133   133      0 21 Jan. 09:31 radwtmp
-rw-r-----  2 root  wheel    0 21 Jan. 09:30 sendmail.st
-rw-------  2 root  wheel  916 21 Jan. 09:31 userlog
[...]

and

: ls -l _.w/var/spool/
total 12
drwxr-xr-x  10 root   931     10 14 Jan. 11:50 asterisk
drwxrwx---   2 smmsp  smmsp    2 21 Jan. 09:29 clientmqueue
drwxrwx---   2 root   mail     2 21 Jan. 09:29 dma
drwxrwxr-x   2 uucp   dialer   2 21 Jan. 09:29 lock
drwxr-xr-x   2 root   daemon   2 21 Jan. 09:29 lpd
drwxr-xr-x   2 root   daemon   2 21 Jan. 09:29 mqueue
drwx------   2 root   daemon   2 21 Jan. 09:29 opielocks
drwxr-xr-x   3 root   daemon   3 21 Jan. 09:29 output
[...]

> Are you able to narrow down when this started to not work again

No, I'm sorry for that. It was in autumn 2017, that is the last point I started to recognize the problem. The application has been running for a while then (> 2 months). 

> It could be something is changed in the pkg program. I'll analyse that.

Thank you. 

The log file shows this - user creation is preceeded by group creation - but I think its not important here, but anyway ... 

[...]
[73/102] Installing asterisk13-13.19.0...
===> Creating groups.
Creating group 'asterisk' with gid '931'.
Creating group 'dahdi' with gid '843'.
===> Creating users
Creating user 'asterisk' with uid '931'.
Adding user 'asterisk' to group 'dahdi'.
[73/102] Extracting asterisk13-13.19.0: .......... done

and because I mentioned it, just the thing for freeraduis:

[...]
[92/102] Installing freeradius3-3.0.15_4...
===> Creating groups.
Creating group 'freeradius' with gid '133'.
===> Creating users
Creating user 'freeradius' with uid '133'.
===> Setting user and group in radiusd.conf
[92/102] Extracting freeradius3-3.0.15_4: .......... done
===> Bootstrapping default certificates, please wait...
===> Adjusting ownership of directory /usr/local/etc/raddb
===> Adjusting ownership of directory /var/log/radacct
===> Adjusting ownership of directory /var/run/radiusd
===> Adjusting ownership of /var/log/radius.log
===> Adjusting ownership of /var/log/radutmp
===> Adjusting ownership of /var/log/radwtmp
===> Updating libdir in /usr/local/etc/raddb/radiusd.conf



For the moment, I go with a interim "late_customization" script fixing the issue manuelly - but you already mentioned and suggested that.

I'll try to setup an alternative jail for building/installing asterisk and I'll check what's going on in the staging folders.