Bug 286703 - During bsdinstall, pkg hangs outputting to serial console
Summary: During bsdinstall, pkg hangs outputting to serial console
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-sysinstall (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-05-09 23:23 UTC by Ravi Pokala
Modified: 2025-05-12 19:58 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ravi Pokala 2025-05-09 23:23:57 UTC
Platform: AMD EPYC 9xx5 CPU (aka "Turin", aka "Zen5"). The serial console is uart.1 (0x2f8), which is connected to the BMC's Serial-Over-LAN service.

To install our system, we boot from LAN. The bootloader comes up, finds and loads the kernel and pre-load modules, and boots into an NFS root filesystem. It then runs `bsdinstall' with a pre-generated config file:

> bsdinstall script /bsdinstall.cfg

The install begins normally: it finds the NICs and storage devices, nukes installation target and creates a new ZFS root there, expands the base system tarball (releng/14.2 + driver backports from stable/14) into the newly-created root, and bootstraps a newer version of `pkg'. It then starts installing our master PKG, which installs various other PKGs (some from the PKG mirrors, some built from ports, and some proprietary) as dependencies.

It is during this stage -- installing dependencies -- that the wheels sometimes fall off the wagon. It will be humming along, and then just stops. The serial console becomes unresponsive, even to the serial debugger sequence ([return] [tilde] [ctrl-b]). The kernel still responds to pings, but because we don't have SSH up and running yet, we can't get much more than that.

We determined that if we use IPMI to trigger an NMI -- from another machine, run `ipmitool -U ${BMC_USER} -P ${BMC_PASSWORD} -H ${BMC_ADDR} chassis power diag' -- then the hung node is able to enter `ddb'. At that point, the serial console becomes responsive again. Running `ps' from the debugger shows that `pkg' is waiting on the console:

```
2267  2266    19     0  S+      ttyout  0xfffff8012e8c68c0  pkg
2266  2224    19     0  S+      wait    0xfffffe14dd714ac0  pkg
2224  1707    19     0  S+      wait    0xfffffe1176466ae0  sh
```

Runing `show msgbuf' shows that the last line in the buffer is for a PKG that gets installed later than the PKG which is mentioned in the last line of console output. That is congruent with the idea that output from `pkg' is getting stuck in the output queue and not being emitted to the console.

At that point, it is possible to `continue' to exit `ddb', and the install resumes and runs to completion.

This suggests a problem with the serial console drivers.

The console config is as follows:

'loader.conf':
```
comconsole_port="0x2f8"
comconsole_speed="115200"
boot_serial="YES"
console="efi"
```

'device.hints':
```
hint.uart.0.at="acpi"
hint.uart.0.port="0x3F8"
hint.uart.0.flags="0x00"
hint.uart.1.at="acpi"
hint.uart.1.port="0x2F8"
hint.uart.1.flags="0x90"
hint.uart.1.baud="115200"
```
Comment 1 Ravi Pokala 2025-05-12 18:16:50 UTC
My colleagues have gathered some additional information:

1. When in the non-responsive state, (some?) kernel messages are still emitted to the serial console. For example, resetting the BMC causes the virtual USB keyboard and mouse to detach and re-attach, and the corresponding messages show up on the console.

2. When in the non-responsive state, it is always waiting on the 't_outwait' condvar in tty.c , which wakes up callers waiting on the TTY.

3. The main paths for triggering that event are 'ttydisc_getc()' or 'ttydisc_getc_uio()' in tty_ttydisc.c , which are in turn called by 'uart_tty_outwakup()' in uart_tty.c . 'uart_tty_outwakeup()' is responsible for draining the TTY buffer and sending the contents to the UART driver.

4. Instrumenting the install, we found that while 'ixon' -- software flow-control -- is enabled, so is 'ixany' -- release the software flow-control pause when any character is input.

Our investigation continues.

In the mean time, while this was seen in the context of `bsdinstall', this looks like a bug in the UART and/or TTY drivers; 'kern', not 'bin'.
Comment 2 Ravi Pokala 2025-05-12 18:58:21 UTC
One of my colleagues commented out the aforementioned 'boot_serial="yes"', causing it to use the video console in preference to the serial console. With that change, they were able to successfully boot from LAN and install over a dozen times.

That is a clear indication that the problem is on the serial side of the equation.
Comment 3 Ravi Pokala 2025-05-12 19:58:00 UTC
When using the video console -- which is implemented in console_tty.c -- the debugger key-combination ([ctr]-[alt]-[esc]) successfully enters the debugger. That is another datapoint suggesting the problem is in the serial side of things, probably uart_tty.c