Bug 237600 - sysutils/apcupsd: the current shutdown procedure is unsuitable, especially for servers
Summary: sysutils/apcupsd: the current shutdown procedure is unsuitable, especially fo...
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Danilo G. Baio
URL:
Keywords: needs-patch
Depends on:
Blocks:
 
Reported: 2019-04-27 10:31 UTC by Victor Sudakov
Modified: 2019-09-18 14:52 UTC (History)
1 user (show)

See Also:
dbaio: maintainer-feedback+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Sudakov 2019-04-27 10:31:20 UTC
The default setup of the port is to run "apcupsd --kill-on-powerfail". This means when the UPS power is exhausted, apcupsd sends a signal to the UPS to hybernate and initiates a system shutdown. Thus the complete shutdown sequence must fit within the UPS grace period.

While this may be tolerated on workstations, it is completely unusable on busy servers, especially running multiple virtual machines in bhyve, because the UPS grace period may be not long enough for all the VMs to shut down correctly. Thus we observe a race condition and cannot predict if there will be enough time for the daemons/VMs to stop, before the host is powered off.

Ideally, apcupsd should wait for all the shutdown scripts to complete, and only then should it send the hybernate signal to the UPS. This means that:

1. apcupsd should be started without the --kill-on-powerfail option, perhaps without any options at all, or with the --term-on-powerfail option. It is easy to implement, just alter the default apcupsd_flags.

2. The "apcupsd --killpower" command should be called somewhere near the very end of the shutdown sequence, on the condition that /var/run/powerfail exists. 

I cannot currently suggest where to put "apcupsd --killpower" in /usr/local/etc/rc.d/ so that it is executed last.

I think that the way apcupsd is run on FreeBSD should be seriously reconsidered.
Comment 1 Danilo G. Baio freebsd_committer 2019-04-27 12:27:23 UTC
Hi Victor.

Thanks for your report.

About your point 2:

"2. The "apcupsd --killpower" command should be called somewhere near 
the very end of the shutdown sequence, on the condition that 
/var/run/powerfail exists."

Maybe you can customize it with apccontrol and your own script.


And about removing the --kill-on-powerfail option, well, that can be tricky.
We are talking about a behavior that is present for 17 years in this port:
https://svnweb.freebsd.org/ports/head/sysutils/apcupsd/files/apcupsd.sh.sample?revision=50851&view=markup&pathrev=50851

What I suggest is to improve pkg-message with a warning and considerations
about this, and maybe a custom sample script for your point 2 and then increase
the PORTVERSION to force the update and to inform all users.
After a while, we can think about changing this behavior.

That sounds reasonable?

And thank you for pointing this out.

Regards.
Comment 2 Victor Sudakov 2019-04-27 13:15:41 UTC
(In reply to Danilo G. Baio from comment #1)
> Maybe you can customize it with apccontrol and your own script.

I'm afraid I cannot, and I'll explain why it's a problem.

If apcupsd has been started with --kill-on-powerfail, it first sends the hybernate signal to the UPS and then starts the doshutdown procedure from apccontrol. No matter what you put into doshutdown/apccontrol, you are already at the mercy of the UPS. It won't wait for your scripts and customizations.

That's why I'm suggesting that doshutdown and the killing of UPS should be separated.

> a behavior that is present for 17 years 

17 years is a very long term for a software, maybe it's time for a revolution? No custom script will work untul --kill-on-powerfail is there.
Comment 3 Danilo G. Baio freebsd_committer 2019-04-27 13:19:42 UTC
(In reply to Victor Sudakov from comment #2)

you can if you insert apcupsd_flags="" in your rc.conf
Comment 4 Victor Sudakov 2019-04-27 13:30:29 UTC
(In reply to Danilo G. Baio from comment #3)
> you can if you insert apcupsd_flags="" in your rc.conf

Ah, then yes, of course, if I change the whole logic. For the present, I suggest the following in in rc.conf:

apcupsd_flags="--term-on-powerfail" 

and the following at the end of /etc/rc.shutdown:

test -f /var/run/powerfail && apcupsd --killpower

But I feel that modifying rc.shutdown is ugly.

What say you?
Comment 5 Danilo G. Baio freebsd_committer 2019-04-27 13:39:41 UTC
(In reply to Victor Sudakov from comment #4)

you should take a look at the apccontrol script, you can customize each command, there is an explanation there
Comment 6 Victor Sudakov 2019-04-27 13:59:22 UTC
(In reply to Danilo G. Baio from comment #5)
> you should take a look at the apccontrol script, you can customize each command, there is an explanation there

I have already studied the script and its man page, but I have no idea what can be changed *there* to get rid of the race condition between shutting down daemons and hybernating the UPS. What do you suggest changing there?

To me, the apccontrol script is fine, we just need, after we call doshutdown, to wait for all daemons to stop, and only then send the kill signal to the UPS.

If you read the man page to the appcontrol script, you will see the following:

killpower    apcupsd does not normally generate this event. Instead, it
                  is invoked directly from the system halt script as
                  'apccontrol killpower' because the killpower event needs to
                  be performed as late in system shutdown as possible.

That's exactly what I'm talking about the whole time: we should shift the killpower as late as possible, but how do we implement this on FreeBSD?
Comment 7 Danilo G. Baio freebsd_committer 2019-04-27 14:16:28 UTC
I don't want you to change apccontrol, I'm saying about customize the commands:

Create a script `doshutdown` in /usr/local/etc/apcupsd.

Example:

-----------------------------------------------------------------------
#!/bin/sh

stop bhyve
stop other main deamons
check for whatever you want
and then,
/sbin/shutdown -h now "apcupsd initiated shutdown, bhyve is safe..."
exit 99
-----------------------------------------------------------------------

you can even call `apccontrol killpower` in it if you want.
Comment 8 Victor Sudakov 2019-04-27 15:16:48 UTC
(In reply to Danilo G. Baio from comment #7)
> Create a script `doshutdown` in /usr/local/etc/apcupsd.

Oh, this may be a solution. Thank you. I'll try it out.

> you can even call `apccontrol killpower` in it if you want.

This would probably require starting apcupsd initially with --term-on-powerfail instead of --kill-on-powerfail, right?
Comment 9 Danilo G. Baio freebsd_committer 2019-04-27 22:15:52 UTC
I suggest you to remove all the flags, create this custom script to stop your VMs and insert `apccontrol killpower` command at the end, just before the `shutdown`.

Regards.
Comment 10 Victor Sudakov 2019-04-28 03:51:56 UTC
(In reply to Danilo G. Baio from comment #9)
>I suggest you to remove all the flags,

This means an instance of apcupsd will remain running during shutdown, while another instance of apcupsd will be trying to kill the UPS. Are you sure they will not conflict for access to the UPS?

> insert `apccontrol killpower` command at the end, just before the `shutdown`.

But `apccontrol killpower` is a NOOP, all actions are commented out in apccontrol.
Comment 11 Danilo G. Baio freebsd_committer 2019-04-28 14:39:43 UTC
(In reply to Victor Sudakov from comment #10)

You are right, `apccontrol killall` is commented out, but you can use `apcupsd --killpower` (with -f config-file) directly in the custom doshutdown.

I didn't know about the two instances of apcupsd, but I think it's another reason for you use custom commands.

Do some tests and let us know, we can add some hints about this in the pkg-message, and add some custom commands as examples.

Regards.
Comment 12 Victor Sudakov 2019-04-29 02:42:34 UTC
(In reply to Danilo G. Baio from comment #11)
> you can use `apcupsd --killpower` (with -f config-file) directly in the custom doshutdown.

Before or after "shutdown -h", in your opinion?

On the other hand, the more I think of this custom doshutdown solution, the less I like it. You know why? Because in fact it does not eliminate the race between the system shutdown procedure and the UPS grace time.

The correct way would be to place `apcupsd --killpower` somewhere in the shutdown procedure itself, as close to the end thereof as possible. 

The good news is that `apcupsd --killpower` tests if /var/run/powerfail exists and refuses to kill power if it does not, so there is even no need for an additional check.
Comment 13 Victor Sudakov 2019-05-01 06:23:27 UTC
(In reply to Danilo G. Baio from comment #11)
> Do some tests and let us know,

My conclusion in Russian is now here: https://victor-sudakov.dreamwidth.org/470378.html

An English translation will be ready shortly, maybe tonight.
Comment 14 Victor Sudakov 2019-05-01 11:51:13 UTC
My final conclusion. The default way apcupsd is started by the port (with --kill-on-powerfail) is wrong. You cannot rely on the UPS grace shutdown delay even with the default setting of rcshutdown_timeout="90", while using virtual machines necessitates making rcshutdown_timeout even longer. A race between the UPS grace delay and the FreeBSD shutdown procedure must be eradicated.

The only feasible way of doing it is:

1. Starting apcupsd with apcupsd_flags="--term-on-powerfail" so that the daemon exits once it has started the doshutdown procedure and does not send any commands to the UPS at this stage.

2. Putting the line
"test -f /var/run/powerfail && /usr/local/sbin/apcupsd --hibernate"
or
"test -f /var/run/powerfail && /usr/local/sbin/apcupsd --power-off"

(depending on the desired behaviour of the UPS after the mains is restored) at the very end of the /etc/rc.shutdown script, after the "Insert other shutdown procedures here" line.

Thus the shutdown procedure started by "apcupsd --term-on-powerfail" can proceed at its own tempo, with "apcupsd --hibernate" being called when all the daemons and VMs have been safely shut down.
Comment 15 Danilo G. Baio freebsd_committer 2019-05-05 19:57:53 UTC
(In reply to Victor Sudakov from comment #14)

Thank you Victor.

I want to insert this in the pkg-message:

```
For default, apcupsd starts with `--kill-on-powerfail` parameter.
Please, read its man page, and if this is not the intended behavior you want,
change it accordingly.

In some systems where the shutdown can take a while (like in bhyve environment),
you may want to change this behavior as follows:

Set apcupsd_flags="--term-on-powerfail" on your /etc/rc.conf[.local].

Add this to /etc/rc.shutdown, after the "Insert other shutdown procedures here"
line:
  test -f /var/run/powerfail && /usr/local/sbin/apcupsd --hibernate
  or
  test -f /var/run/powerfail && /usr/local/sbin/apcupsd --power-off
```

Do you want to add something more?

Regards.
Comment 16 Victor Sudakov 2019-05-06 04:07:51 UTC
(In reply to Danilo G. Baio from comment #15)
Seems fine to me. 

I hope it will be the first step to changing the way apcupsd is integrated into the system, maybe inspired by Gentoo.
Comment 17 commit-hook freebsd_committer 2019-05-11 18:31:17 UTC
A commit references this bug:

Author: dbaio
Date: Sat May 11 18:30:41 UTC 2019
New revision: 501281
URL: https://svnweb.freebsd.org/changeset/ports/501281

Log:
  sysutils/apcupsd: Improve pkg-message

  In some systems where the shutdown can take a while (like in bhyve
  environment), users may want to change the default shutdown behavior.

  PR:		237600
  Reported by:	Victor Sudakov <vas@mpeks.tomsk.su>

Changes:
  head/sysutils/apcupsd/files/pkg-message.in
Comment 18 Danilo G. Baio freebsd_committer 2019-05-11 18:32:45 UTC
Thank you!
Comment 19 Victor Sudakov 2019-09-18 14:52:07 UTC
The new procedure still sucks. I had a power failure today and again found my bhyve guests with "/ was not properly dismounted" .

We need someone really smart to figure out where the race is. I'm not smart enough to solve the case.