Bug 261284 - bhyve emulation of 12.3 on 12.3 frequently crashes
Summary: bhyve emulation of 12.3 on 12.3 frequently crashes
Status: Closed DUPLICATE of bug 261338
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.3-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-01-17 17:46 UTC by Martin Birgmeier
Modified: 2022-01-22 07:43 UTC (History)
1 user (show)

See Also:


Attachments
core.txt.0, info.0 (25.27 KB, application/x-compressed-tar)
2022-01-17 17:46 UTC, Martin Birgmeier
no flags Details
crash info from just described crash (22.47 KB, application/x-compressed-tar)
2022-01-17 19:15 UTC, Martin Birgmeier
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2022-01-17 17:46:09 UTC
Created attachment 231089 [details]
core.txt.0, info.0

Scenario:
- Host system (hal) recently updated from 12.2 to 12.3 (amd64)
- Client system (v909) running 12.3 i386 off a ZFS volume
- /usr/src and /usr/obj are on hal and reachable by the client via NFS
- "cd /usr/src && make installkernel installworld DESTDIR=/usr/tmp/x ..."

Result:
- Normally, this setup works (mostly) o.k.
- But with this boot of the host (hal), the client (v909) always crashes whenever doing the above install
- Crash info attached

Notes:
- In 12.3 there seems to be a regression in bhyve because quite often clients running 13.0 just stop in the boot process (zero CPU in the host). They have to be killed using bhyvectl, and in most cases after a restart they boot normally. This seems to indicate some uninitialized data affecting 13.0 clients (amd64 clients more than i386).
- In the special case described in this PR, the 12.3 i386 client reliably crashes (I have tried it three times so far without rebooting the host). I am fairly sure that once I reboot the host the problem will have gone away. This again seems to indicate some uninitialized data affecting, in this case, a 12.3 i386 client.
- In 12.2 (host) there were basically no bhyve problems for both 12.2 and 13.0 clients (amd64 and i386).

Summary: There seems to be a regression in bhyve from 12.2 to 12.3, most likely with some uninitialized data structures, most likely in the host, but probably also in the client.

-- Martin
Comment 1 Martin Birgmeier 2022-01-17 18:21:33 UTC
Scenario (continued):
- Host (hal) rebooted

Result:
- Now the client just hangs a short while after starting the install, host load = 100% CPU (out of 400%), no ping to the client possible
Comment 2 Martin Birgmeier 2022-01-17 19:11:51 UTC
Scenario (continued):
- rolled the ZFS volume backing v909 back to yesterday's state
- restarted the client (v909)
- In the client:

[0]% cd /usr/src ..................... this is mounted via NFS from hal using automount
[0%] df .
Filesystem  1K-blocks Used Avail Capacity  Mounted on
map -hosts3         0    0     0   100%    /net
[0]# 

- Now this is very strange, because it should be the NFS mount!

[0]# ls
...
[0]# df .
Filesystem                              1K-blocks    Used     Avail Capacity  Mounted on
hal.1/1/SRC/FreeBSD/src/MBi/releng/12.3 750831648 3405803 747425845     0%    /z/SRC/FreeBSD/src/MBi/releng/12.3
[0]# 

- Now it shows the right mountpoint
- Starting "make installkernel installworld..."

Result:
- After a (longer) time, the machine crashes again.
Comment 3 Martin Birgmeier 2022-01-17 19:15:51 UTC
Created attachment 231091 [details]
crash info from just described crash
Comment 4 Martin Birgmeier 2022-01-17 19:19:26 UTC
Scenario (continued):
- Again restarted v909, file system check repairs everything...
- Retrying the cd/df command from before:

[0]# cd /usr/src
[0]# df .
Filesystem  1K-blocks Used Avail Capacity  Mounted on
map -hosts3         0    0     0   100%    /net
[0]# ls
.arcconfig              include                 release
.arclint                kerberos5               rescue
.cirrus.yml             lib                     sbin
.git                    libexec                 secure
.gitattributes          LOCKS                   share
.gitignore              MAINTAINERS             stand
bin                     Makefile                sys
cddl                    Makefile.inc1           targets
contrib                 Makefile.libcompat      tests
COPYRIGHT               Makefile.sys.inc        tools
crypto                  ObsoleteFiles.inc       UPDATING
etc                     README                  usr.bin
gnu                     README.md               usr.sbin
[0]# df .
Filesystem                             1K-blocks    Used     Avail Capacity  Mounted on
hal:/z/SRC/FreeBSD/src/MBi/releng/12.3 750720567 3405803 747314764     0%    /net/hal/z/SRC/FreeBSD/src/MBi/releng/12.3
[0]# 

- But I guess this is not the reason for the crash.
- I am at a loss how to proceed now, the last time the installkernel/installworld ran successfully was when this machine was upgraded to 12.3-p1.

-- Martin
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2022-01-21 19:07:05 UTC
12.3-p1 i386 guests crashing sounds like PR 261338.
Comment 6 Martin Birgmeier 2022-01-21 19:19:28 UTC
Thanks for the pointer, I'll try the patch from bug #261338.

-- Martin
Comment 7 Martin Birgmeier 2022-01-22 07:43:54 UTC
This seems fixed by the patch in issue #261338.

-- Martin

*** This bug has been marked as a duplicate of bug 261338 ***