Bug 251531 - unclean dhclient shutdown due to r366857 (r367049)
Summary: unclean dhclient shutdown due to r366857 (r367049)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: conf (show other bugs)
Version: 12.2-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Cy Schubert
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-12-02 14:56 UTC by Helge Oldach
Modified: 2020-12-07 15:45 UTC (History)
5 users (show)

See Also:


Attachments
Try this (319 bytes, patch)
2020-12-02 16:07 UTC, Cy Schubert
no flags Details | Diff
Revert and create a new script. (2.84 KB, patch)
2020-12-03 02:07 UTC, Cy Schubert
no flags Details | Diff
Tested and works. (1.99 KB, patch)
2020-12-03 04:20 UTC, Cy Schubert
no flags Details | Diff
Be a little more aggressive. (2.00 KB, patch)
2020-12-03 14:22 UTC, Cy Schubert
no flags Details | Diff
That was wrong. (1.99 KB, patch)
2020-12-03 14:52 UTC, Cy Schubert
no flags Details | Diff
Only shut down cloned interfaces (7.84 KB, patch)
2020-12-03 16:08 UTC, Cy Schubert
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Helge Oldach 2020-12-02 14:56:01 UTC
Since base r366857 (MFC to 12-STABLE: base r367049) dhclient gets killed uncleanly upon interface desctruction.

Arguably this is just cosmetic, however this is new behaviour and should probably be tidied up.

Stopping devd.
Waiting for PIDS: 797.
lo0: link state changed to DOWN
Dec  2 15:30:29 dtk59 dhclient[785]: receive_packet failed on bge0.31: Device not configured
Dec  2 15:30:29 dtk59 dhclient[785]: ioctl(SIOCGIFFLAGS) on bge0.31: Operation not permitted
Dec  2 15:30:29 dtk59 dhclient[785]: Interface bge0.31 no longer appears valid.
Dec  2 15:30:29 dtk59 dhclient[785]: No live interfaces to poll on - exiting.
Dec  2 15:30:29 dtk59 dhclient[785]: exiting.
Dec  2 15:30:29 dtk59 dhclient[785]: connection closed
Dec  2 15:30:29 dtk59 dhclient[785]: exiting.
/etc/rc.shutdown: WARNING: bge0.21 does not exist.  Skipped.
/etc/rc.shutdown: WARNING: bge0.31 does not exist.  Skipped.
/etc/rc.shutdown: WARNING: bge0.32 does not exist.  Skipped.
Stopping Network: lo0 bge0 em0 em1.
Comment 1 Cy Schubert freebsd_committer freebsd_triage 2020-12-02 16:07:07 UTC
Created attachment 220173 [details]
Try this

I don't see these messages here. However my dhclient listens on lagg0 (bge0 failover to wlan0). Try this patch to see if it provides relief.
Comment 2 Helge Oldach 2020-12-02 16:50:33 UTC
(In reply to Cy Schubert from comment #1)
That didn't help, same messages still.

Even worse, in addition there now is:

Writing entropy file: .
Writing early boot entropy file: .
/etc/rc.shutdown: ERROR: /etc/rc.shutdown: no interface specified
Terminated
.

In my case bge0 doesn't have an IP on its own, but it has 3 subinterfaces on 3 different VLANs, one of which (bge0.31) has dhclient listening and the other 2 are static.
Comment 3 Cy Schubert freebsd_committer freebsd_triage 2020-12-02 17:40:49 UTC
That explains why I don't see the problem.

Can you post your rc.conf, please.
Comment 4 Helge Oldach 2020-12-02 18:28:05 UTC
(In reply to Cy Schubert from comment #3)
The network related part:

kld_list="if_bge if_em snd_hda ichsmb smb crypto"
ifconfig_bge0="up"
ifconfig_bge0_21="inet 172.16.126.9 netmask 255.255.255.0"
ifconfig_bge0_31="DHCP fib 1"
ifconfig_bge0_32="inet 192.168.0.7 netmask 255.255.255.0"
vlans_bge0="21 31 32"
defaultrouter="172.16.126.1"
route_default_fib1="default 192.168.0.1 -fib 1"
static_routes="default_fib1"

Yes I have a tendency keeping things terse.

IMHO it's related to base r366857 (MFC to 12-STABLE: base r367049). While the rationale behind reaping cloned interfaces during shutdown is certainly valid, it's happening too early apparently. It's done at a stage with listening processes still active, which might get confused if their interface is destroyed. dhclient is just one example. Likely there are others. Maybe they won't bark as verbose as dhclient, but then their unclean shutdown has the potential of triggering undesired side effects.
Comment 5 Cy Schubert freebsd_committer freebsd_triage 2020-12-03 02:07:02 UTC
Created attachment 220201 [details]
Revert and create a new script.

This will run netif stop at the end of shutdown.
Comment 6 Cy Schubert freebsd_committer freebsd_triage 2020-12-03 04:20:38 UTC
Created attachment 220203 [details]
Tested and works.

This should fix the problem.
Comment 7 Helge Oldach 2020-12-03 07:24:07 UTC
Hmmm... doesn't change a bit for me:

Dec  3 08:11:29 dtk59 kernel: lo0: link state changed to DOWN
Dec  3 08:11:29 dtk59 dhclient[772]: receive_packet failed on bge0.31: Device not configured
Dec  3 08:11:29 dtk59 dhclient[772]: ioctl(SIOCGIFFLAGS) on bge0.31: Operation not permitted
Dec  3 08:11:29 dtk59 dhclient[772]: Interface bge0.31 no longer appears valid.
Dec  3 08:11:29 dtk59 dhclient[772]: No live interfaces to poll on - exiting.
Dec  3 08:11:29 dtk59 dhclient[772]: exiting.
Dec  3 08:11:29 dtk59 dhclient[772]: connection closed
Dec  3 08:11:29 dtk59 dhclient[772]: exiting.
Dec  3 08:11:30 dtk59 root[1764]: /etc/rc.shutdown: WARNING: bge0.21 does not exist.  Skipped.
Dec  3 08:11:30 dtk59 root[1766]: /etc/rc.shutdown: WARNING: bge0.31 does not exist.  Skipped.
Dec  3 08:11:30 dtk59 root[1768]: /etc/rc.shutdown: WARNING: bge0.32 does not exist.  Skipped.
Dec  3 08:11:30 dtk59 syslogd: exiting on signal 15

I rebooted twice and double checked netif and netifshutdown but doesn't seem to work.

rcorder -pkshutdown /etc/rc.d/*

/etc/rc.d/swap
/etc/rc.d/mountcritlocal
/etc/rc.d/gssd
/etc/rc.d/zfs
/etc/rc.d/mixer /etc/rc.d/ugidfw /etc/rc.d/random
/etc/rc.d/addswap
/etc/rc.d/ipfs
/etc/rc.d/rtsold /etc/rc.d/devd
/etc/rc.d/zfsd
/etc/rc.d/local_unbound /etc/rc.d/netifdown
/etc/rc.d/kdc /etc/rc.d/nfsuserd /etc/rc.d/kfd
/etc/rc.d/ipropd_slave /etc/rc.d/hostapd /etc/rc.d/nfscbd /etc/rc.d/kpasswdd /etc/rc.d/kadmind /etc/rc.d/ipropd_master
/etc/rc.d/hastd /etc/rc.d/localpkg /etc/rc.d/watchdogd /etc/rc.d/bsnmpd /etc/rc.d/auditd
/etc/rc.d/auditdistd /etc/rc.d/rpcbind
/etc/rc.d/nfsclient
/etc/rc.d/ypserv
/etc/rc.d/ypbind /etc/rc.d/ypupdated /etc/rc.d/ypxfrd /etc/rc.d/ypldap
/etc/rc.d/ypset
/etc/rc.d/amd /etc/rc.d/yppasswdd /etc/rc.d/keyserv
/etc/rc.d/automount /etc/rc.d/mountd
/etc/rc.d/nfsd
/etc/rc.d/statd
/etc/rc.d/lockd
/etc/rc.d/timed /etc/rc.d/rwho /etc/rc.d/rtadvd /etc/rc.d/hcsecd /etc/rc.d/utx /etc/rc.d/powerd /etc/rc.d/ntpd /etc/rc.d/nscd /etc/rc.d/lpd /etc/rc.d/sdpd /etc/rc.d/local /etc/rc.d/ubthidhci /etc/rc.d/ftp-proxy /etc/rc.d/moused
/etc/rc.d/bthidd /etc/rc.d/swaplate
/etc/rc.d/ftpd /etc/rc.d/jail /etc/rc.d/cron /etc/rc.d/sshd /etc/rc.d/inetd
Comment 8 Helge Oldach 2020-12-03 09:05:25 UTC
Is this perhaps related to 12-STABLE only? My machine is on base r368270.
Comment 9 Helge Oldach 2020-12-03 09:38:19 UTC
I ran the same rcorder without the patch applied. The rcorder impact of the patch is:

 /etc/rc.d/mixer /etc/rc.d/ugidfw /etc/rc.d/random
 /etc/rc.d/addswap
 /etc/rc.d/ipfs
-/etc/rc.d/netif
 /etc/rc.d/rtsold /etc/rc.d/devd
 /etc/rc.d/zfsd
-/etc/rc.d/local_unbound
+/etc/rc.d/local_unbound /etc/rc.d/netifdown
 /etc/rc.d/kdc /etc/rc.d/nfsuserd /etc/rc.d/kfd
 /etc/rc.d/ipropd_slave /etc/rc.d/hostapd /etc/rc.d/nfscbd /etc/rc.d/kpasswdd /etc/rc.d/kadmind /etc/rc.d/ipropd_master
 /etc/rc.d/hastd /etc/rc.d/localpkg /etc/rc.d/watchdogd /etc/rc.d/bsnmpd /etc/rc.d/auditd

So it looks like the netif stuff is just moved a few steps earlier.

It doesn't look to me like any of the rc scripts within the window changed above would bring down dhclient orderly, or am I mistaken?
Comment 10 Cy Schubert freebsd_committer freebsd_triage 2020-12-03 14:22:12 UTC
Created attachment 220219 [details]
Be a little more aggressive.

First, rc.shutdown doesn't use the -p flag and there is no way to specify that in rc.conf. I'm not sure why you are in your tests. Anyhow, it doesn't matter, people can still change rc.shutdown themselves. This patch takes this into account.
Comment 11 Cy Schubert freebsd_committer freebsd_triage 2020-12-03 14:52:29 UTC
Created attachment 220220 [details]
That was wrong.

The original patch was correct. I didn't catch your mistake. To test run the below. Remember, rc.shutdown reverses the order it receives from rc.shutdown. And, rc.shutdown doesn't use the -p flag (unless you've customized your rc and rc.shutdown).

slippy$ rcorder -kshutdown /etc/rc.d/* /usr/local/etc/rc.d/* | tail -r
/usr/local/etc/rc.d/fetchmail
/usr/local/etc/rc.d/smartd
/usr/local/etc/rc.d/apache24
/usr/local/etc/rc.d/cbsd-statsd-bhyve
/usr/local/etc/rc.d/cbsd-statsd-hoster
/usr/local/etc/rc.d/cbsd-statsd-jail
/usr/local/etc/rc.d/ezjail
/usr/local/etc/rc.d/postgresql
/usr/local/etc/rc.d/dma_flushq
/usr/local/etc/rc.d/postfix
/usr/local/etc/rc.d/zabbix_proxy
/usr/local/etc/rc.d/zabbix_server
/usr/local/etc/rc.d/cbsdd
/usr/local/etc/rc.d/nagios
/usr/local/etc/rc.d/squid
/usr/local/etc/rc.d/nginx
/usr/local/etc/rc.d/nrpe3
/usr/local/etc/rc.d/nsd
/usr/local/etc/rc.d/ntimed
/usr/local/etc/rc.d/vnstat
/usr/local/etc/rc.d/volmand
/usr/local/etc/rc.d/virtlogd
/usr/local/etc/rc.d/cbsdrsyncd
/usr/local/etc/rc.d/phd
/usr/local/etc/rc.d/php-fpm
/usr/local/etc/rc.d/postfwd
/usr/local/etc/rc.d/spamass-milter
/etc/rc.d/sshd
/usr/local/etc/rc.d/cf-execd
/usr/local/etc/rc.d/qjail.bootime
/usr/local/etc/rc.d/socat
/usr/local/etc/rc.d/rplayd
/usr/local/etc/rc.d/cf-monitord
/usr/local/etc/rc.d/rsyncd
/usr/local/etc/rc.d/saned
/usr/local/etc/rc.d/cf-serverd
/usr/local/etc/rc.d/vboxwebsrv
/usr/local/etc/rc.d/dovecot
/usr/local/etc/rc.d/sa-spamd
/usr/local/etc/rc.d/darkstat
/usr/local/etc/rc.d/vboxwatchdog
/usr/local/etc/rc.d/vboxheadless
/etc/rc.d/ftpd
/usr/local/etc/rc.d/firebird
/usr/local/etc/rc.d/fossil
/usr/local/etc/rc.d/swapmon
/usr/local/etc/rc.d/mysql-server
/usr/local/etc/rc.d/xdm
/etc/rc.d/inetd
/etc/rc.d/jail
/usr/local/etc/rc.d/atop
/usr/local/etc/rc.d/jail.jailconf.bootime
/usr/local/etc/rc.d/jail.rcconf.bootime
/usr/local/etc/rc.d/svscan
/usr/local/etc/rc.d/jailrc
/usr/local/etc/rc.d/jenkins
/usr/local/etc/rc.d/sshguard
/usr/local/etc/rc.d/kpropd
/usr/local/etc/rc.d/libvirtd
/usr/local/etc/rc.d/mailman
/usr/local/etc/rc.d/mbmon
/usr/local/etc/rc.d/bastille
/usr/local/etc/rc.d/vpnc
/usr/local/etc/rc.d/memcached
/etc/rc.d/cron
/usr/local/etc/rc.d/cups_browsed
/usr/local/etc/rc.d/cupsd
/usr/local/etc/rc.d/samba_server
/usr/local/etc/rc.d/mrtg_daemon
/etc/rc.d/swaplate
/etc/rc.d/bthidd
/usr/local/etc/rc.d/webcamd
/etc/rc.d/ubthidhci
/usr/local/etc/rc.d/diskcheckd
/usr/local/etc/rc.d/fail2ban
/usr/local/etc/rc.d/git_daemon
/etc/rc.d/moused
/usr/local/etc/rc.d/httpry
/usr/local/etc/rc.d/innd
/usr/local/etc/rc.d/isc-dhcrelay
/usr/local/etc/rc.d/isc-dhcrelay6
/usr/local/etc/rc.d/mdnsd
/usr/local/etc/rc.d/mdnsresponderposix
/usr/local/etc/rc.d/openssh
/usr/local/etc/rc.d/openvpn
/usr/local/etc/rc.d/oss
/usr/local/etc/rc.d/pfstatd
/usr/local/etc/rc.d/poudriered
/usr/local/etc/rc.d/rrdcached
/usr/local/etc/rc.d/slpd
/etc/rc.d/ftp-proxy
/usr/local/etc/rc.d/imap4d
/usr/local/etc/rc.d/comsatd
/usr/local/etc/rc.d/swapd
/usr/local/etc/rc.d/swapexd
/usr/local/etc/rc.d/swatchdog
/usr/local/etc/rc.d/uhidd
/usr/local/etc/rc.d/watchd
/usr/local/etc/rc.d/zabbix_agentd
/usr/local/etc/rc.d/timed
/usr/local/etc/rc.d/svnserve
/usr/local/etc/rc.d/snort
/usr/local/etc/rc.d/rinetd
/usr/local/etc/rc.d/proftpd
/usr/local/etc/rc.d/monitorix
/usr/local/etc/rc.d/isc-dhcpd6
/usr/local/etc/rc.d/isc-dhcpd
/usr/local/etc/rc.d/ipv6mon
/usr/local/etc/rc.d/gkrellmd
/usr/local/etc/rc.d/doinkd
/usr/local/etc/rc.d/dhcp6s
/usr/local/etc/rc.d/dhcp6relay
/etc/rc.d/utx
/etc/rc.d/rwho
/etc/rc.d/rtadvd
/etc/rc.d/sdpd
/etc/rc.d/powerd
/etc/rc.d/ntpd
/etc/rc.d/nscd
/etc/rc.d/lpd
/etc/rc.d/local
/etc/rc.d/hcsecd
/usr/local/etc/rc.d/conserver
/usr/local/etc/rc.d/saslauthd
/usr/local/etc/rc.d/pop3d
/usr/local/etc/rc.d/milter-opendkim
/etc/rc.d/lockd
/etc/rc.d/statd
/etc/rc.d/nfsd
/etc/rc.d/mountd
/etc/rc.d/automount
/etc/rc.d/keyserv
/usr/local/etc/rc.d/automounter
/etc/rc.d/yppasswdd
/usr/local/etc/rc.d/amd
/etc/rc.d/ypset
/etc/rc.d/ypbind
/etc/rc.d/ypldap
/etc/rc.d/ypxfrd
/etc/rc.d/ypupdated
/usr/local/etc/rc.d/obspamlogd
/etc/rc.d/ypserv
/usr/local/etc/rc.d/radiusd
/usr/local/etc/rc.d/stunnel
/usr/local/etc/rc.d/monit
/usr/local/etc/rc.d/stund
/usr/local/etc/rc.d/obspamd
/usr/local/etc/rc.d/stubby
/usr/local/etc/rc.d/ipacctd
/etc/rc.d/hastd
/usr/local/etc/rc.d/named
/usr/local/etc/rc.d/tcsd
/usr/local/etc/rc.d/tpmd
/usr/local/etc/rc.d/dnsmasq
/usr/local/etc/rc.d/vm
/etc/rc.d/nfsclient
/etc/rc.d/rpcbind
/etc/rc.d/auditdistd
/usr/local/etc/rc.d/ipa
/etc/rc.d/bsnmpd
/usr/local/etc/rc.d/relayd
/etc/rc.d/watchdogd
/etc/rc.d/auditd
/etc/rc.d/localpkg
/usr/local/etc/rc.d/nut_upsmon
/etc/rc.d/hostapd
/etc/rc.d/nfscbd
/etc/rc.d/kpasswdd
/etc/rc.d/kadmind
/usr/local/etc/rc.d/kiconv
/usr/local/etc/rc.d/nut_upslog
/etc/rc.d/ipropd_slave
/etc/rc.d/ipropd_master
/usr/local/etc/rc.d/tproxy
/usr/local/etc/rc.d/tftpd
/etc/rc.d/nfsuserd
/usr/local/etc/rc.d/netdumpd
/usr/local/etc/rc.d/kea
/usr/local/etc/rc.d/ftpsesame
/usr/local/etc/rc.d/distccd
/etc/rc.d/kdc
/etc/rc.d/kfd
/usr/local/etc/rc.d/sndiod
/usr/local/etc/rc.d/nut
/etc/rc.d/netifdown
/usr/local/etc/rc.d/unbound
/etc/rc.d/local_unbound
/etc/rc.d/netwait
/etc/rc.d/zfsd
/etc/rc.d/rtsold
/etc/rc.d/devd
/usr/local/etc/rc.d/slapd
/usr/local/etc/rc.d/mpd5
/usr/local/etc/rc.d/dhcp6c
/etc/rc.d/ipfs
/etc/rc.d/addswap
/etc/rc.d/mixer
/etc/rc.d/ugidfw
/etc/rc.d/random
/usr/local/etc/rc.d/jaildaemon
/etc/rc.d/gssd
/etc/rc.d/mountcritlocal
/etc/rc.d/swap
slippy$ 

Anyhow, I've tested it here and it works. This fixes the problem jhb@ had.

Your dhclient will still complain because vlans are not shut down gracefully when your parent interfaces are. This is a different problem that might requre an additional loop.
Comment 12 Helge Oldach 2020-12-03 16:02:45 UTC
(In reply to Cy Schubert from comment #10)
Sorry for confusing with the -p flag. It was just cosmetic sugar meant to avoid ambiguity in the sequence.

The net observation is simply that

/etc/rc.d/rtsold, 
/etc/rc.d/devd, 
/etc/rc.d/zfsd, and
/etc/rc.d/local_unbound

are the scripts that are run between /etc/rc.d/netif (previously) and /etc/rc.d/netifdown (after patch). The only exception is that /etc/rc.d/local_unbound may run before /etc/rc.d/netifdown *or not* as these two have the same "rank".

The expected outcome of the patch would be that one of these four scripts or /etc/rc.d/netifdown would stop dhclient. But apparently none does. Also I think that none of the four scripts would ever touch anything related to a network interface, but maybe I'm mistaken.

So I think your conclusion is correct: client interfaces should be shut gracefully before the parent but it seems they aren't - in contrast to the commit text of base r366857 (MFC to 12-STABLE: base r367049).
Comment 13 Cy Schubert freebsd_committer freebsd_triage 2020-12-03 16:08:19 UTC
Created attachment 220222 [details]
Only shut down cloned interfaces

There were two other issues that jhb@ and emaste@ had identified with r366857. My thinking is now to only shut down cloned interfaces as John had initially suggested.
Comment 14 Helge Oldach 2020-12-03 16:36:10 UTC
(In reply to Cy Schubert from comment #13)
That fixes it! Silent shutdown as before base r366857. Please MFC as well.
Comment 15 commit-hook freebsd_committer freebsd_triage 2020-12-04 19:31:20 UTC
A commit references this bug:

Author: cy
Date: Fri Dec  4 19:31:16 UTC 2020
New revision: 368345
URL: https://svnweb.freebsd.org/changeset/base/368345

Log:
  Revert r366857.

  r366857 created a number of problems, tearing down interfaces too
  early in shutdown. This resulted in:

  - hung ssh sessions when shutting down or rebooting remotely using
    shutdown (I've used exec shutdown, for years, as apposed to simply
    shutdown).

  - NFS mounted filesystems "disappear" prior to unmount.

  - dhclient attached to a VLAN on an interface who's parent interface
    has already shut down prints errors.

  The path forward is to teach lagg(4) and vlan(4) about WOL.

  PR:		251531, 251540
  PR:		158734, 109980 are broken again
  Reported by:	jhb, emaste, jtl, Helge Oldach<freebsd_oldach.net>
  		Martin Birgmeier <d8zNeCFG_aon.at>
  MFC after:      Immediately
  Discussion at:	https://reviews.freebsd.org/D27459

Changes:
  head/libexec/rc/rc.d/netif
Comment 16 commit-hook freebsd_committer freebsd_triage 2020-12-04 19:36:29 UTC
A commit references this bug:

Author: cy
Date: Fri Dec  4 19:35:43 UTC 2020
New revision: 368346
URL: https://svnweb.freebsd.org/changeset/base/368346

Log:
  Revert r366857.

  r366857 created a number of problems, tearing down interfaces too
  early in shutdown. This resulted in:

  - hung ssh sessions when shutting down or rebooting remotely using
    shutdown (I've used exec shutdown, for years, as apposed to simply
    shutdown).

  - NFS mounted filesystems "disappear" prior to unmount.

  - dhclient attached to a VLAN on an interface who's parent interface
    has already shut down prints errors.

  The path forward is to teach lagg(4) and vlan(4) about WOL.

  PR:		251531, 251540
  PR:		158734, 109980 are broken again
  Reported by:	jhb, emaste, jtl, Helge Oldach<freebsd_oldach.net>
  		Martin Birgmeier <d8zNeCFG_aon.at>
  Discussion at:	https://reviews.freebsd.org/D27459

Changes:
_U  stable/12/
  stable/12/libexec/rc/rc.d/netif
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2020-12-07 15:31:48 UTC
Presumably this PR is resolved by the revert?
Comment 18 Cy Schubert freebsd_committer freebsd_triage 2020-12-07 15:45:01 UTC
Correct. The revert resolves this issue. The PRs the reverted commit fixed are now broken again. Anyone enabling WOL on a lagg(4) member device will discover that WOL will no longer work.

I am working on a kernel patch to add this functionality to the kernel.