Created attachment 231089 [details] core.txt.0, info.0 Scenario: - Host system (hal) recently updated from 12.2 to 12.3 (amd64) - Client system (v909) running 12.3 i386 off a ZFS volume - /usr/src and /usr/obj are on hal and reachable by the client via NFS - "cd /usr/src && make installkernel installworld DESTDIR=/usr/tmp/x ..." Result: - Normally, this setup works (mostly) o.k. - But with this boot of the host (hal), the client (v909) always crashes whenever doing the above install - Crash info attached Notes: - In 12.3 there seems to be a regression in bhyve because quite often clients running 13.0 just stop in the boot process (zero CPU in the host). They have to be killed using bhyvectl, and in most cases after a restart they boot normally. This seems to indicate some uninitialized data affecting 13.0 clients (amd64 clients more than i386). - In the special case described in this PR, the 12.3 i386 client reliably crashes (I have tried it three times so far without rebooting the host). I am fairly sure that once I reboot the host the problem will have gone away. This again seems to indicate some uninitialized data affecting, in this case, a 12.3 i386 client. - In 12.2 (host) there were basically no bhyve problems for both 12.2 and 13.0 clients (amd64 and i386). Summary: There seems to be a regression in bhyve from 12.2 to 12.3, most likely with some uninitialized data structures, most likely in the host, but probably also in the client. -- Martin
Scenario (continued): - Host (hal) rebooted Result: - Now the client just hangs a short while after starting the install, host load = 100% CPU (out of 400%), no ping to the client possible
Scenario (continued): - rolled the ZFS volume backing v909 back to yesterday's state - restarted the client (v909) - In the client: [0]% cd /usr/src ..................... this is mounted via NFS from hal using automount [0%] df . Filesystem 1K-blocks Used Avail Capacity Mounted on map -hosts3 0 0 0 100% /net [0]# - Now this is very strange, because it should be the NFS mount! [0]# ls ... [0]# df . Filesystem 1K-blocks Used Avail Capacity Mounted on hal.1/1/SRC/FreeBSD/src/MBi/releng/12.3 750831648 3405803 747425845 0% /z/SRC/FreeBSD/src/MBi/releng/12.3 [0]# - Now it shows the right mountpoint - Starting "make installkernel installworld..." Result: - After a (longer) time, the machine crashes again.
Created attachment 231091 [details] crash info from just described crash
Scenario (continued): - Again restarted v909, file system check repairs everything... - Retrying the cd/df command from before: [0]# cd /usr/src [0]# df . Filesystem 1K-blocks Used Avail Capacity Mounted on map -hosts3 0 0 0 100% /net [0]# ls .arcconfig include release .arclint kerberos5 rescue .cirrus.yml lib sbin .git libexec secure .gitattributes LOCKS share .gitignore MAINTAINERS stand bin Makefile sys cddl Makefile.inc1 targets contrib Makefile.libcompat tests COPYRIGHT Makefile.sys.inc tools crypto ObsoleteFiles.inc UPDATING etc README usr.bin gnu README.md usr.sbin [0]# df . Filesystem 1K-blocks Used Avail Capacity Mounted on hal:/z/SRC/FreeBSD/src/MBi/releng/12.3 750720567 3405803 747314764 0% /net/hal/z/SRC/FreeBSD/src/MBi/releng/12.3 [0]# - But I guess this is not the reason for the crash. - I am at a loss how to proceed now, the last time the installkernel/installworld ran successfully was when this machine was upgraded to 12.3-p1. -- Martin
12.3-p1 i386 guests crashing sounds like PR 261338.
Thanks for the pointer, I'll try the patch from bug #261338. -- Martin
This seems fixed by the patch in issue #261338. -- Martin *** This bug has been marked as a duplicate of bug 261338 ***