Bug 279901 - glibc-2.39-2 and above on the host segfault
Summary: glibc-2.39-2 and above on the host segfault
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 14.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Konstantin Belousov
URL: https://sourceware.org/bugzilla/show_...
Keywords:
: 282927 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-06-22 01:17 UTC by holo
Modified: 2025-05-11 11:16 UTC (History)
16 users (show)

See Also:


Attachments
ld.so --list-diagnostics from glibc240 (7.50 KB, text/plain)
2024-12-07 21:38 UTC, Getz Mikalsen
no flags Details
ld-linux-x86-64.so.2 --list-diagnostics from glibc2.40 (40.87 KB, text/plain)
2024-12-08 11:07 UTC, Getz Mikalsen
no flags Details
x86info-r-vanilla-bhyve-14-1-release.txt (4.77 KB, text/plain)
2024-12-24 02:41 UTC, Koichiro Iwao
no flags Details
x86info-r-patched-bhyve-14-stable.txt (4.89 KB, text/plain)
2024-12-24 02:42 UTC, Koichiro Iwao
no flags Details
almalinux-8-patched-bhyve-ldso-list-diagnostics.txt (6.25 KB, text/plain)
2024-12-26 05:59 UTC, Koichiro Iwao
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description holo 2024-06-22 01:17:54 UTC
Reproduction steps:

1. get current arch iso (or other rolling release linux). The following will deal with archlinux
2. boot install medium inside the bhyve vm, and attempt to run any of: [vim, python3, archinstall, gdb (if installed), localedef]
3. all of the above will crash with a segfault (sigsev) and error 4 (cause was a user-mode read resulting in no page being found.)
4. downgrading to glibc-2.39-1 fixes all of the above applications, though in the case of bootstrapping scripts like archinstall, this can be fail to work if, for instance, the script re-downloads glibc.

Existing board post discussing this: https://bbs.archlinux.org/viewtopic.php?id=295802

offending commit: https://sourceware.org/git/?p=glibc.git;a=commit;h=aa4249266e9906c4bc833e4847f4d8feef59504f

Affects:
- Ryzen 5 7600, possibly more AMD Zen3 & Zen4 CPUs

Last working version:
- linux glibc-2.39-1

Relevant /boot/loader.conf:
vmm_load="YES"
hw.vmm.amdvi.enable="1"

Relevant /etc/rc.conf:
vm_enable="YES"
vm_dir="zfs:zroot/vm"

vm-bhyve configuration file:
loader="uefi"
graphics="yes"
xhci_mouse="yes"

cpu="8"
cpu_sockets="1"
cpu_cores="4"
cpu_threads="2"

memory="8G"

ahci_device_limit="8"

network0_type="virtio-net"
network0_switch="public"

disk0_type="nvme"
disk0_name="disk0.img"
Comment 1 tennix 2024-06-23 04:35:32 UTC
Recently I was trying to install nixos-24.05 bhyve vm, the official iso crashed during boot. But 23.11 can be installed without any issues. By searching for the glibc package in different nixos, the nixos-24.05 uses glibc 2.39-52 while nixos-23.11 uses glibc 2.38-77. I think this might be same issue.
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2024-06-24 03:03:08 UTC
So what is the instruction that faults?
Comment 3 jordy 2024-07-02 21:39:53 UTC
How can I help to debug this? On my arch vm I still downgrade glibc, which only works for so long on a rolling release. I tried to install Fedora Server 40 (netinstall) which also fails during installation.

It's kind of difficult to debug as gdb will also crash when the latest glibc is installed. The other user of the arch forum made some backtraces of some applications crashing: https://bbs.archlinux.org/viewtopic.php?pid=2172581#p2172581
Comment 4 Raúl 2024-07-17 06:39:19 UTC
Upgrading a debian sid today

[....]
Unpacking libc6:amd64 (2.39-4) over (2.38-14) ...
Setting up libc6:amd64 (2.39-4) ...
free(): invalid pointer
dpkg: error processing package libc6:amd64 (--configure):
 installed libc6:amd64 package post-installation script subprocess was killed by signal (Aborted), core dumped
Errors were encountered while processing:
 libc6:amd64
Error: Sub-process /usr/bin/dpkg returned an error code (1)
[....]
Comment 5 Raúl 2024-07-17 06:43:08 UTC
CPU: AMD Ryzen 9 5950X 16-Core Processor             (3393.72-MHz K8-class CPU)
Comment 6 Raúl 2024-07-17 06:49:45 UTC
I see on that sid dmesg:

[173364.415802] apt[5234]: segfault at 7f29a9d1aff8 ip 00007f29aa36ff25 sp 00007ffcc0b4ad18 error 4 in libc.so.6[7f29aa23f000+158000] likely on CPU 3 (core 0, socket 3)
[173364.415815] Code: fe 6f 76 40 48 8d 8c 17 7f ff ff ff c5 fe 6f 7e 60 c5 7e 6f 44 16 e0 48 29 fe 48 83 e1 e0 48 01 ce 0f 1f 40 00 c5 fe 6f 4e 60 <c5> fe 6f 56 40 c5 fe 6f 5e 20 c5 fe 6f 26 48 83 c6 80 c5 fd 7f 49
[173385.774003] traps: appstreamcli[5351] general protection fault ip:7f793c03993a sp:7ffcfcdaae90 error:0 in libc.so.6[7f793bfc5000+158000]

don't know too much linux, let me know if I can help
Comment 7 Kyle Evans freebsd_committer freebsd_triage 2024-07-17 07:44:17 UTC
(In reply to jordy from comment #3)

> It's kind of difficult to debug as gdb will also crash when the latest glibc is installed. The other user of the arch forum made some backtraces of some applications crashing: https://bbs.archlinux.org/viewtopic.php?pid=2172581#p2172581

The easy solution is presumably to crash it & get a core, then downgrade glibc and use gdb.  If you load the core, `disas` should presumably get us what we need to make any kind of progress.
Comment 8 Konstantin Belousov freebsd_committer freebsd_triage 2024-07-17 07:50:32 UTC
(In reply to Kyle Evans from comment #7)
Most likely noit, unfortunately.  The downgrade of libc would cause wrong code
to be disassembled, because text is not dumped normally.

Is there a way to affect the glibc selection of the CPU-optimized functions,
like our "ARCHLEVEL" env var?
Comment 9 Kyle Evans freebsd_committer freebsd_triage 2024-07-17 08:02:02 UTC
(In reply to Konstantin Belousov from comment #8)

One could `add-symbol-file` in a saved off copy of the broken shlib at the expected offset and get expected disassembly, perhaps? Not as easy as written before, but an option if there's no env for this.
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2024-07-17 08:06:03 UTC
Perhaps LD_LIBRARY_PATH to force the crashing binary to use crashing libc is
the easiest.
Comment 11 bugzilla 2024-07-21 16:22:56 UTC
Is this the kind of thing you need? "disas" didn't work, so I tried dumping the instructions near the program counter instead. (I have no idea what I'm doing when it comes to gdb.)

  root@localhost:~# gdb --core=python3.core
  GNU gdb (Debian 13.2-1+b2) 13.2
  ...
  Core was generated by `python3'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000000000553914 in ?? ()

  (gdb) bt
  #0  0x0000000000553914 in ?? ()
  #1  0x0000000000000000 in ?? ()

  (gdb) disas
  No function contains program counter for selected frame.

  (gdb) x/50i ($pc - 100)
     0x5538b0:	test   %eax,%eax
     0x5538b2:	je     0x554064
     0x5538b8:	test   %eax,%eax
     0x5538ba:	jns    0x55406d
     0x5538c0:	mov    %r14,%r12
     0x5538c3:	cmp    %r14,%r15
     0x5538c6:	jae    0x553f1d
     0x5538cc:	mov    %ebp,%r14d
     0x5538cf:	shr    $0x6,%bpl
     0x5538d3:	lea    0x28(%r13),%rax
     0x5538d7:	mov    %r13,0x28(%rsp)
     0x5538dc:	and    $0x1,%ebp
     0x5538df:	shr    $0x5,%r14b
     0x5538e3:	mov    %rax,0x10(%rsp)
     0x5538e8:	mov    %r12,%r13
     0x5538eb:	mov    %bpl,0x8(%rsp)
     0x5538f0:	and    $0x1,%r14d
     0x5538f4:	mov    %rbx,0x30(%rsp)
     0x5538f9:	mov    %r14d,%ebx
     0x5538fc:	mov    %r8,%r14
     0x5538ff:	mov    %r13,%rax
     0x553902:	mov    %r14,%rdx
     0x553905:	sub    %r15,%rax
     0x553908:	sar    $0x4,%rax
     0x55390c:	lea    (%r15,%rax,8),%rbp
     0x553910:	mov    0x0(%rbp),%rsi
  => 0x553914:	mov    0x10(%rsi),%r12
     0x553918:	movzbl 0x20(%rsi),%eax
     0x55391c:	cmp    %r14,%r12
     0x55391f:	cmovle %r12,%rdx
     0x553923:	test   $0x20,%al
     0x553925:	je     0x451a14
     0x55392b:	test   $0x40,%al
     0x55392d:	je     0x554dfc
     0x553933:	add    $0x28,%rsi
     0x553937:	test   %bl,%bl
     0x553939:	je     0x555085
     0x55393f:	cmpb   $0x0,0x8(%rsp)
     0x553944:	je     0x554018
     0x55394a:	mov    0x10(%rsp),%rdi
     0x55394f:	call   0x4217f0
     0x553954:	test   %eax,%eax
     0x553956:	je     0x554030
     0x55395c:	test   %eax,%eax
     0x55395e:	jns    0x554040
     0x553964:	cmp    %rbp,%r15
     0x553967:	jae    0x55404d
     0x55396d:	mov    %rbp,%r13
     0x553970:	jmp    0x5538ff
     0x553972:	nopw   0x0(%rax,%rax,1)

And for vim:

  root@localhost:~# gdb --core=vim.core
  GNU gdb (Debian 13.2-1+b2) 13.2
  ...
  Core was generated by `vim'.
  Program terminated with signal SIGABRT, Aborted.
  #0  0x00007fee03ec47a7 in ?? ()

  (gdb) bt
  #0  0x00007fee03ec47a7 in ?? ()
  #1  0x0000559fc8dc4831 in ?? ()
  #2  0x00007fffc3822820 in ?? ()
  #3  0x00000000000001a6 in ?? ()
  #4  0x00007fee03de9440 in ?? ()
  #5  <signal handler called>
  #6  0x00007fee03f1339c in ?? ()
  #7  0x00007fffc3822860 in ?? ()
  #8  0x2c0d8adf099bf900 in ?? ()
  #9  0x0000000000000006 in ?? ()
  #10 0x00007fee03de9440 in ?? ()
  #11 0x00007fffc3822820 in ?? ()
  #12 0x00007fffc3822820 in ?? ()
  #13 0x00007fffc3822820 in ?? ()
  #14 0x00007fee03ec44f2 in ?? ()
  #15 0x00007fee04060b50 in ?? ()
  #16 0x00007fee03ead4ed in ?? ()
  #17 0x0000000000000020 in ?? ()
  #18 0x0000000000000000 in ?? ()

  (gdb) x/50i ($pc - 100)
     0x7fee03ec4743:	jne    0x7fee03ec4652
     0x7fee03ec4749:	xor    %edx,%edx
     0x7fee03ec474b:	xor    %esi,%esi
     0x7fee03ec474d:	jmp    0x7fee03ec4652
     0x7fee03ec4752:	nopw   0x0(%rax,%rax,1)
     0x7fee03ec4758:	mov    0x19a699(%rip),%rdx        # 0x7fee0405edf8
     0x7fee03ec475f:	neg    %eax
     0x7fee03ec4761:	mov    %eax,%fs:(%rdx)
     0x7fee03ec4764:	mov    $0xffffffff,%edx
     0x7fee03ec4769:	jmp    0x7fee03ec4717
     0x7fee03ec476b:	call   0x7fee03f98b20
     0x7fee03ec4770:	sub    $0x8,%rsp
     0x7fee03ec4774:	call   0x7fee03f18220
     0x7fee03ec4779:	test   %eax,%eax
     0x7fee03ec477b:	jne    0x7fee03ec4788
     0x7fee03ec477d:	add    $0x8,%rsp
     0x7fee03ec4781:	ret
     0x7fee03ec4782:	nopw   0x0(%rax,%rax,1)
     0x7fee03ec4788:	mov    0x19a669(%rip),%rdx        # 0x7fee0405edf8
     0x7fee03ec478f:	mov    %eax,%fs:(%rdx)
     0x7fee03ec4792:	mov    $0xffffffff,%eax
     0x7fee03ec4797:	jmp    0x7fee03ec477d
     0x7fee03ec4799:	nopl   0x0(%rax)
     0x7fee03ec47a0:	mov    $0x3e,%eax
     0x7fee03ec47a5:	syscall
  => 0x7fee03ec47a7:	cmp    $0xfffffffffffff001,%rax
     0x7fee03ec47ad:	jae    0x7fee03ec47b0
     0x7fee03ec47af:	ret
     0x7fee03ec47b0:	mov    0x19a641(%rip),%rcx        # 0x7fee0405edf8
     0x7fee03ec47b7:	neg    %eax
     0x7fee03ec47b9:	mov    %eax,%fs:(%rcx)
     0x7fee03ec47bc:	or     $0xffffffffffffffff,%rax
     0x7fee03ec47c0:	ret
     0x7fee03ec47c1:	cs nopw 0x0(%rax,%rax,1)
     0x7fee03ec47cb:	nopl   0x0(%rax,%rax,1)
     0x7fee03ec47d0:	mov    $0x8,%esi
     0x7fee03ec47d5:	mov    $0x7f,%eax
     0x7fee03ec47da:	syscall
     0x7fee03ec47dc:	cmp    $0xfffffffffffff000,%rax
     0x7fee03ec47e2:	ja     0x7fee03ec47e8
     0x7fee03ec47e4:	ret
     0x7fee03ec47e5:	nopl   (%rax)
     0x7fee03ec47e8:	mov    0x19a609(%rip),%rdx        # 0x7fee0405edf8
     0x7fee03ec47ef:	neg    %eax
     0x7fee03ec47f1:	mov    %eax,%fs:(%rdx)
     0x7fee03ec47f4:	mov    $0xffffffff,%eax
     0x7fee03ec47f9:	ret
     0x7fee03ec47fa:	nopw   0x0(%rax,%rax,1)
     0x7fee03ec4800:	cmpb   $0x0,0x1a2839(%rip)        # 0x7fee04067040
     0x7fee03ec4807:	je     0x7fee03ec4820

To get the above output, I used the latest Debian Sid nocloud image:
https://cloud.debian.org/cdimage/cloud/sid/daily/20240721-1815/debian-sid-nocloud-amd64-daily-20240721-1815.tar.xz

I ran it on Bhyve on AMD to get the core dump and on KVM on Intel to debug it. The debug version of Python (python3-dbg) doesn't crash, so I don't know how to get debug symbols.
Comment 12 Konstantin Belousov freebsd_committer freebsd_triage 2024-07-22 09:39:51 UTC
Just in case, could you also please show the GPR file ((gdb) show registers).

From what you posted, the damage was done elsewhere and it is an innocent
general purpose instruction that got invalid memory address.  Is it possible
to install debugging symbols on the Linux distro you use?
Comment 13 bugzilla 2024-07-23 20:18:28 UTC
(In reply to Konstantin Belousov from comment #12)

Here are the register values you asked for. Installing the debug symbols using debuginfod (or find-dbgsym-packages) doesn't seem to have changed the backtraces.

The damage is presumably done by something related to memcpy/memmove since this commit is what causes the symptoms to manifest:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=aa4249266e9906c4bc833e4847f4d8feef59504f;hp=5a461f2949ded98d8211939f84988bc464c7b4fe

Python:

  root@localhost:~# gdb --core=python3.core
  GNU gdb (Debian 13.2-1+b2) 13.2
  ...
  This GDB supports auto-downloading debuginfo from the following URLs:
    <https://debuginfod.debian.net>
  Enable debuginfod for this session? (y or [n]) y
  Debuginfod has been enabled.
  To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
  Core was generated by `python3'.
  Program terminated with signal SIGSEGV, Segmentation fault.
  #0  0x0000000000553914 in ?? ()

  (gdb) bt
  #0  0x0000000000553914 in ?? ()
  #1  0x0000000000000000 in ?? ()

  (gdb) info registers
  rax            0x5                 5
  rbx            0x1                 1
  rcx            0x7                 7
  rdx            0xc                 12
  rsi            0xa2967             665959
  rdi            0x7f1c2020d318      139758774833944
  rbp            0x7f1c201a4458      0x7f1c201a4458
  rsp            0x7ffd271820b0      0x7ffd271820b0
  r8             0xc                 12
  r9             0x1                 1
  r10            0x7f1c202eb078      139758775742584
  r11            0x7f1c20434d00      139758777093376
  r12            0x7f1c201a4480      139758774404224
  r13            0x7f1c201a4480      139758774404224
  r14            0xc                 12
  r15            0x7f1c201a4430      139758774404144
  rip            0x553914            0x553914
  eflags         0x10216             [ PF AF IF RF ]
  cs             0x33                51
  ss             0x2b                43
  ds             0x0                 0
  es             0x0                 0
  fs             0x0                 0
  gs             0x0                 0

Vim:

  root@localhost:~# gdb --core=vim.core
  GNU gdb (Debian 13.2-1+b2) 13.2
  ...
  This GDB supports auto-downloading debuginfo from the following URLs:
    <https://debuginfod.debian.net>
  Enable debuginfod for this session? (y or [n]) y
  Debuginfod has been enabled.
  To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
  Core was generated by `vim'.
  Program terminated with signal SIGABRT, Aborted.
  #0  0x00007fee03ec47a7 in ?? ()

  (gdb) bt
  #0  0x00007fee03ec47a7 in ?? ()
  #1  0x0000559fc8dc4831 in ?? ()
  #2  0x00007fffc3822820 in ?? ()
  #3  0x00000000000001a6 in ?? ()
  #4  0x00007fee03de9440 in ?? ()
  #5  <signal handler called>
  #6  0x00007fee03f1339c in ?? ()
  #7  0x00007fffc3822860 in ?? ()
  #8  0x2c0d8adf099bf900 in ?? ()
  #9  0x0000000000000006 in ?? ()
  #10 0x00007fee03de9440 in ?? ()
  #11 0x00007fffc3822820 in ?? ()
  #12 0x00007fffc3822820 in ?? ()
  #13 0x00007fffc3822820 in ?? ()
  #14 0x00007fee03ec44f2 in ?? ()
  #15 0x00007fee04060b50 in ?? ()
  #16 0x00007fee03ead4ed in ?? ()
  #17 0x0000000000000020 in ?? ()
  #18 0x0000000000000000 in ?? ()

  (gdb) info registers
  rax            0x0                 0
  rbx            0x1                 1
  rcx            0x7fee03ec47a7      140660244760487
  rdx            0x0                 0
  rsi            0x6                 6
  rdi            0x1a6               422
  rbp            0x6                 0x6
  rsp            0x7fffc38220d8      0x7fffc38220d8
  r8             0x7fffc3822020      140736473473056
  r9             0x559fdb866f50      94145071181648
  r10            0x8                 8
  r11            0x206               518
  r12            0x7fffc3822820      140736473475104
  r13            0x6                 6
  r14            0x7fffc3822820      140736473475104
  r15            0x7fffc3822820      140736473475104
  rip            0x7fee03ec47a7      0x7fee03ec47a7
  eflags         0x206               [ PF IF ]
  cs             0x33                51
  ss             0x2b                43
  ds             0x0                 0
  es             0x0                 0
  fs             0x0                 0
  gs             0x0                 0
Comment 14 Evgenii Khramtsov (inactive) 2024-11-24 04:06:31 UTC
*** Bug 282927 has been marked as a duplicate of this bug. ***
Comment 15 Koichiro Iwao freebsd_committer freebsd_triage 2024-11-27 01:01:07 UTC
Hi, I'm also an AlmaLinux member and encountered the same issue with AlmaLinux 9.5 and 10 Kitten. AlmaLinux 10 Kitten has glibc 2.39 but 9.x has older versions.

In AlmaLinux 9, glibc-2.34-100.el9_4.4 for 9.4 doesn't have the issue but glibc-2.34-125.el9_5.1 for 9.5 does. So I guess some changes between the versions affects. It was the offending commit the reporter mentioned. 

https://git.almalinux.org/rpms/glibc/commit/4da5357bcd7b44f2ee8891f477a5a6c34973ddba
https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/6dbf26d6f46fce37e63fcf2eec542b03ef4e0704

AlmaLinux team reverted the changes and it avoided the issue. I'm still not sure if this issue is glibc bug or bhyve bug. At least I've never seen this issue except bhyve. Have anyone seen this on bare Zen 3/4 machines?
Comment 16 Evgenii Khramtsov (inactive) 2024-11-29 11:51:21 UTC
Not exposing ERMS in CPUID works fine here as a workaround with Arch Linux@Zen 3:

diff --git a/sys/amd64/vmm/x86.c b/sys/amd64/vmm/x86.c
--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -434,7 +434,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                                    CPUID_STDEXT_BMI1 | CPUID_STDEXT_HLE |
                                    CPUID_STDEXT_AVX2 | CPUID_STDEXT_SMEP |
                                    CPUID_STDEXT_BMI2 |
-                                   CPUID_STDEXT_ERMS | CPUID_STDEXT_RTM |
+                                   CPUID_STDEXT_RTM |
                                    CPUID_STDEXT_AVX512F |
                                    CPUID_STDEXT_AVX512DQ |
                                    CPUID_STDEXT_RDSEED |
Comment 17 Getz Mikalsen 2024-11-29 22:05:02 UTC
(In reply to Evgenii Khramtsov from comment #16)
Thanks for the patch, works here as well for epyc milan.
Comment 18 Florian Weimer 2024-12-06 04:08:12 UTC
If you copy over /usr/bin/ld.so from glibc 2.40 into an older distribution (without replacing the installed /usr/bin/ld.so on the older distribution), does “ld.so --list-diagnostics” work? If so, please attach its output.
Comment 19 Adhemerval Zanella 2024-12-06 13:34:43 UTC
I am trying to reproduce it with a different hypervisor/emulation (in this case qemu/kvm) with a Ryzen 9 5900x Zen3 core but both AlmaLinux 10 Kitten (glibc 2.39) and debian sid (glibc 2.40) boots and works without any issue.

And I also verified on debian sid the selected memcpy/memmove is indeed the one that optimized with glibc change (__memcpy_avx_unaligned_erms). I even tried to run glibc memcpy/memmove tests in this VM, where they stress a lot of different sizes and alignments for different memcpy/memmove implementations.

Also, my daily workstation (Ryzen 9 5900x) the uses a recent glibc that also contains this issue and I haven't see any memcpy/memmove related issue.

So I am not sure if this is a glibc issue.
Comment 20 Getz Mikalsen 2024-12-07 21:38:02 UTC
Created attachment 255691 [details]
ld.so --list-diagnostics from glibc240
Comment 21 Florian Weimer 2024-12-08 10:08:31 UTC
(In reply to Getz Mikalsen from comment #20)
> ld.so --list-diagnostics from glibc240

Sorry, this says “version.version="2.36"” in the dump, and it lacks the CPUID diagnostics.
Comment 22 Getz Mikalsen 2024-12-08 10:26:22 UTC
(In reply to Florian Weimer from comment #21)
It's /usr/bin/ld.so from glibc 2.40 copied over into an older distribution running 2.36 just like you asked.

I was able to install the latest archlinux iso by not exposing ERMS in CPUID. Then I reverted the patch and it booted, although would easily segfault.
I can attach the output from --list-diagnostics from that vm if that'd be of more use.
Comment 23 Florian Weimer 2024-12-08 10:31:58 UTC
(In reply to Getz Mikalsen from comment #22)
> It's /usr/bin/ld.so from glibc 2.40 copied over into an older distribution running 2.36 just like you asked.

Sorry, It looks like you have copied over the symbolic link instead of the symbolic link target. Sorry, I should have mentioned that.
Comment 24 Getz Mikalsen 2024-12-08 11:07:28 UTC
Created attachment 255708 [details]
ld-linux-x86-64.so.2 --list-diagnostics from glibc2.40

(In reply to Florian Weimer from comment #23)
Sorry my bad, here is the actual ld-linux-x86-64.so.2 output, no symlink this time. :-)
Comment 25 Florian Weimer 2024-12-09 12:32:23 UTC
(In reply to Getz Mikalsen from comment #24)

I can confirm that the data looks complete now, thnaks. Now I just have to find someone who can make sense of it. 8-)
Comment 26 Darren Henderson 2024-12-11 21:11:11 UTC
Just another data point that seems to be related....

This bug rears it's head on a Ryzen 7 5700g system running 14.2-RELEASE. 

It's impossible currently to install a bhyve instance with any of the Fedora derivatives (AlmaLinux 9.5, CentOS Stream 9, Rocky 9.5, Oracle 9.5, and Fedora 41).

You can install AlmaLinux 9.4 or Rocky 9.4 successfully (haven't tried all the others) and it runs perfectly. Doing a "dnf upgrade" on those leads to a borked systems as segmentation faults start showing up as soon as glibc is upgraded.
Comment 27 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-12 00:41:40 UTC
(In reply to Darren Henderson from comment #26)
Yes, RHEL derivatives 9.5 includes this patch to glibc and it is an issue:
https://gitlab.com/redhat/centos-stream/rpms/glibc/-/commit/6dbf26d6f46fce37e63fcf2eec542b03ef4e0704

I reported the issue to glibc upstream but it seems that the cause is on bhyve side because I've only seen this problem on bhyve. 
https://sourceware.org/bugzilla/show_bug.cgi?id=30994

bhyve guru is wanted!
Comment 28 Darren Henderson 2024-12-13 19:34:30 UTC
This has been sitting unassigned for nearly six months now apparently - can we up the priority and scope of effected systems?

It is present in 14.2 as well as 14.1 and is going to become more critical as the glibc versions in question propagate further and further. I haven't seen any indication that it only effects a few AMD CPUs so chances are it's going to start snowballing. 

I don't want to overstate it but the potential is there that bhyve will be less and less functional for a significant number of people.
Comment 29 Mark Peek freebsd_committer freebsd_triage 2024-12-13 20:13:00 UTC
(In reply to Darren Henderson from comment #28)

The glibc changes indicated in this bug were specific to amd. I have tried numerous times to reproduce this issue on intel systems but have not seen it occur. Sadly I don't have an applicable AMD systems to test on.

Here's my arch install as a bhyve guest:
$ uname -a
Linux archlinux 6.12.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 09 Dec 2024 14:31:57 +0000 x86_64 GNU/Linux
$ ld.so --list-diagnostics | grep -e version.version
version.version="2.40"

Given some of the ld.so output in previous updates to this bug, I did a diff to try to find the x86 tunable changes between them.
43c40
< version.version="2.36"
---
> version.version="2.40"
189,191c196,199
< x86.cpu_features.rep_movsb_threshold=0x2000
< x86.cpu_features.rep_movsb_stop_threshold=0x80000
< x86.cpu_features.rep_stosb_threshold=0x800
---
> x86.cpu_features.memset_non_temporal_threshold=0xc000000000
> x86.cpu_features.rep_movsb_threshold=0x0
> x86.cpu_features.rep_movsb_stop_threshold=0xc000000000
> x86.cpu_features.rep_stosb_threshold=0xffffffffffffffff

Running binaries with this environment set, so the intel system would run the same code path in glibc as amd would, did not cause the issue on my intel system.

GLIBC_TUNABLES="glibc.cpu.x86_memset_non_temporal_threshold=0xc000000000:glibc.cpu.x86_rep_movsb_threshold=0x0:glibc.cpu.x86_rep_movsb_stop_threshold=0xc000000000:glibc.cpu.x86_rep_stosb_threshold=0xffffffffffffffff"

If anyone has seen it on intel, it would be great to have the repro steps added to this bug.
Comment 30 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-14 10:39:25 UTC
I do believe that the problem is specific to AMD.  The AMD hw virtualization
assist (SVM) is completely different from the Intel facility (VMX), and perhaps
there is a bug in bhyve.

What is not clear to me, is it a bug to look for in bhyve, or some hw quirk
in the reported CPU models.

As a first step, could somebody extract the code that is used for broken
configuration?  I suspect that it is actually ERMS(or FRMS) + AVX2 and not
just ERMS memcpy().
Comment 31 Michael Dexter freebsd_triage 2024-12-17 09:24:27 UTC
This is reportedly fixed in Alma Linux.
Comment 32 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-17 09:56:04 UTC
(In reply to Michael Dexter from comment #31)
Can you provide any details on the supposed fix?
Comment 33 Florian Weimer 2024-12-17 10:07:57 UTC
If someone can provide me SSH access to a guest system that exhibits the issue once glibc is updated, I can try to debug it (assuming that GDB/ptrace works on the guest, but I don't see why it wouldn't). No root access or glibc update would be needed; the issue should reproduce with an uninstalled upstream rebuild of glibc.

I'm not convinced if I will be able to reproduce it locally.
Comment 34 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-17 13:06:38 UTC
(In reply to Konstantin Belousov from comment #32)
> This is reportedly fixed in Alma Linux.

I don't get exactly Michael Dexter meant but this is a misunderstanding. I, with a hat of AlmaLinux developer,  would say that the issue is not FIXED on AlmaLinux side.

AlmaLinux just tried to revert the offending commit to AVOID the issue. It is not a FIX at all. There's no progress on the AlmaLinux side for a fundamental fix.
Comment 35 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-17 13:11:58 UTC
(In reply to Florian Weimer from comment #33)
I can provide you access to the system if you're IPv6 reachable. Could you email me the public SSH key?
Comment 36 Florian Weimer 2024-12-17 16:59:36 UTC
Thanks for the offers of machine access.

I should have studied the ld.so --list-diagnostics output. It's a recurrence of the previous rep_movsb_threshold bug because it ends up as zero. The bhyve bug that triggers this is that it reports 1 TiB of L3 cache (x86.cpu_features.level3_cache_size=0x10000000000 in the diagnostic output). This triggers an integer truncation in glibc's cache size computation. Misreporting cache information like this typically impacts performance, so it should be fixed independently of the glibc bug.

The glibc bug is below. I'll submit it upstream and we'll backport it.

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index e9579505..6a0a30ba 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -1021,11 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
     non_temporal_threshold = maximum_non_temporal_threshold;
 
   /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
-  unsigned int minimum_rep_movsb_threshold;
+  unsigned long int minimum_rep_movsb_threshold;
   /* NB: The default REP MOVSB threshold is 4096 * (VEC_SIZE / 16) for
      VEC_SIZE == 64 or 32.  For VEC_SIZE == 16, the default REP MOVSB
      threshold is 2048 * (VEC_SIZE / 16).  */
-  unsigned int rep_movsb_threshold;
+  unsigned long int rep_movsb_threshold;
   if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
       && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
     {

With this fix, the testsuite is very clean, only nptl/tst-mutex10 fails with a test timeout. Cause is unclear. (The test does not actually use elision because the system does not support RTM.)
Comment 37 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-17 21:08:02 UTC
(In reply to Florian Weimer from comment #36)
Do you see which CPUID leaf causes the trouble?

As a guess, I wonder if the following bhyve patch helps (it tries to fix
CPUID leaf 0x8000_001D %ecx 3):

diff --git a/sys/amd64/vmm/x86.c b/sys/amd64/vmm/x86.c
index a833b61786e7..8474666b5e6f 100644
--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -256,7 +256,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
 				func = 3;	/* unified cache */
 				break;
 			default:
-				logical_cpus = 0;
+				logical_cpus = sockets * threads * cores;
 				level = 0;
 				func = 0;
 				break;
Comment 38 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-18 05:07:22 UTC
(In reply to Konstantin Belousov from comment #37)
This didn't help. It still reports the same l3 cache size.

$ ld.so --list-diagnostics |grep cache
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x80000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4
Comment 39 Florian Weimer 2024-12-18 11:24:45 UTC
(In reply to Konstantin Belousov from comment #37)
> Do you see which CPUID leaf causes the trouble?

Let me try based on attachment 255708 [details]. The maximum leaf is 0x80000023 according to this:

x86.processor[0x0].cpuid.eax[0x80000000].eax=0x80000023

Ordinarly, handle_amd in sysdeps/x86/dl-cacheinfo.h would use the modern way for obtaining cache details, using leaf 0x8000001D:

x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].eax=0x121
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].ebx=0x3f
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].ecx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x0].edx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].eax=0x143
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].ebx=0x3f
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].ecx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x1].edx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].eax=0x163
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].ebx=0x3f
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].ecx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x2].edx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].eax=0x3ffc100
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].ebx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].ecx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].edx=0x0
x86.processor[0x0].cpuid.subleaf_eax[0x8000001d].ecx[0x3].until_ecx=0x1ff

L3 cache data is subleaf 3. We have a safety check that requires ECX != 0, in case hypervisors do not fill in this information, which is happening here. We fall back to the legacy way of obtaining cache size.  That uses leaf 0x80000006 for L3 cache information:

x86.processor[0x0].cpuid.eax[0x80000006].eax=0x48002200
x86.processor[0x0].cpuid.eax[0x80000006].ebx=0x68004200
x86.processor[0x0].cpuid.eax[0x80000006].ecx=0x2006140
x86.processor[0x0].cpuid.eax[0x80000006].edx=0x8009140

The base L3 cache size is 2 * (EDX & 0x3ffc0000), so 256 MIB. This is not unreasonable for an EPYC system, and it's probably right.

However, that number could be a per-socket number, and the way we use this number for tuning, we need a per-thread amount. We adjust this per leaf 0x80000008. The thread count is in (ECX & 0xff) + 1:

x86.processor[0x0].cpuid.eax[0x80000008].eax=0x3030
x86.processor[0x0].cpuid.eax[0x80000008].ebx=0x7
x86.processor[0x0].cpuid.eax[0x80000008].ecx=0x0
x86.processor[0x0].cpuid.eax[0x80000008].edx=0x10007

So we get 1, and there is no per-thread scale-down. (I think the hypervisor should expose a more realistic count here?)

If the CPU family is at least 0x17, we assume that the number is measured per core complex. And that comes again from leaf 0x8000001d, subleaf 3, but this time register EAX. It's computed as (EAX >> 14 & 0xfff) + 1. This evaluates to 4096 here, and I think this is the bug. This CCX count is just way too high. Based on the available information, the glibc code assumes that there are 4096 instances of 256 MiB caches, which translates to 1 TiB of L3 cache (per thread, but the thread count is 1).
Comment 40 Michael Dexter freebsd_triage 2024-12-18 17:13:23 UTC
(In reply to Koichiro Iwao from comment #34)
That was relayed from a colleague and I am awaiting a link. :-|
Comment 41 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-18 23:49:22 UTC
(In reply to Florian Weimer from comment #39)
For 8000_001Dh, the %ecx == 0 reply seems to be legal, for fully assoc cache.
This probably explains why my previous attempt did not worked, lets arbitrary
set asssoc to 2 (%ecx == 1). From your explanation, and what I see in code,
glibc should use the 'new way' then.

For 8000_0006h, bhyve reflects the data reported by the host CPU.

For 8000_0008h, the reported number of threads is user-controllable, AFAIR.
There is a strange force-fallback to legacy reporting of ApicIdSize (%ecx 15:12)
when cpu count per-package is less than 16.  A useful experiment is to remove it.



diff --git a/sys/amd64/vmm/x86.c b/sys/amd64/vmm/x86.c
index a833b61786e7..b00ae12f802d 100644
--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -150,8 +150,8 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
 				 * pkg_id_shift and other OSes may rely on it.
 				 */
 				width = MIN(0xF, log2(threads * cores));
-				if (width < 0x4)
-					width = 0;
+//				if (width < 0x4)
+//					width = 0;
 				logical_cpus = MIN(0xFF, threads * cores - 1);
 				regs[2] = (width << AMDID_COREID_SIZE_SHIFT) | logical_cpus;
 			}
@@ -256,7 +256,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
 				func = 3;	/* unified cache */
 				break;
 			default:
-				logical_cpus = 0;
+				logical_cpus = sockets * threads * cores;
 				level = 0;
 				func = 0;
 				break;
@@ -266,7 +266,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
 			regs[0] = (logical_cpus << 14) | (1 << 8) |
 			    (level << 5) | func;
 			regs[1] = (func > 0) ? (CACHE_LINE_SIZE - 1) : 0;
-			regs[2] = 0;
+			regs[2] = 1;	/* Num of cache ways */
 			regs[3] = 0;
 			break;
Comment 42 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-23 09:20:02 UTC
(In reply to Konstantin Belousov from comment #41)
I still get the same result.

$ ld.so --list-diagnostics |grep cache
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x80000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4
Comment 43 Florian Weimer 2024-12-23 10:06:59 UTC
(In reply to Koichiro Iwao from comment #42)
> I still get the same result.

With an updated glibc? Which version did you build, and how did you install it?
Comment 44 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-23 19:05:05 UTC
(In reply to Florian Weimer from comment #43)
I am sure it is with the previous (not fixed) glibc and my bhyve patch applied.

(In reply to Koichiro Iwao from comment #42)
Could you please show the output 'x86info -r' from patched and non-patched bhyve?
Comment 45 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-24 00:37:39 UTC
(In reply to Florian Weimer from comment #43)
glibc I use here is not an updated version. 

The updated glibc is working fine, and I backported your patch to AlmaLinux 9.5. The updated package will be released soon. I will also report it to CentOS Stream 9 to get the fix backported. Thanks for your effort.

https://git.almalinux.org/rpms/glibc/pulls/3

The issue I'm trying to address here now is to fix the 1TiB L3 cache issue, I think it is still a bhyve issue and needs to be fixed separately from the glibc issue. 

(In reply to Konstantin Belousov from comment #44)
I'll post it later.
Comment 46 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-24 02:41:32 UTC
Created attachment 256095 [details]
x86info-r-vanilla-bhyve-14-1-release.txt
Comment 47 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-24 02:42:11 UTC
Created attachment 256096 [details]
x86info-r-patched-bhyve-14-stable.txt
Comment 48 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-24 02:45:23 UTC
(In reply to Konstantin Belousov from comment #44)
Sorry, it turned out that I could not apply your patch provided in comment #41. I believe the patch is properly applied now. Now it looks to be working properly. 

$ ld.so --list-diagnostics |grep level3
x86.cpu_features.level3_cache_size=0x2
x86.cpu_features.level3_cache_assoc=0x1
x86.cpu_features.level3_cache_linesize=0x1
Comment 49 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-24 02:52:07 UTC
(In reply to Koichiro Iwao from comment #48)
Ok, but does unpatched glibc work on patched bhyve?
Comment 50 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-24 03:50:40 UTC
(In reply to Konstantin Belousov from comment #49)
Yes, unpatched glibc with patched bhyve works. After unpatching bhyve, it causes segfault again.
Comment 51 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-24 04:22:53 UTC
https://reviews.freebsd.org/D48187
Comment 53 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-26 05:16:31 UTC
The bhyve patch in comment #41 broke EL8 (AlmaLinux 8, RockyLinux 8). There was no issue with unpatched bhyve.

Host: FreeBSD 14-STABLE w/ patch comment #14 on AMD Ryzen 7 5700G
Guest: AlmaLinux 8.10, RockyLinux 8.10 (glibc-2.28-251.el8_10.2.x86_64)

Actually, I run EL8 container on AlmaLinux 9 on bhyve.

AL9$ podman run -it --rm quay.io/almalinuxorg/almalinux:8 /bin/bash 
AL9$ echo $?
139
AL9$ dmesg | tail -2
[99577.163764] bash[2877]: segfault at 7ffdaa60f130 ip 00007fb9a9a12cf0 sp 00007ffdaa60e0e8 error 4 in libc-2.28.so[7fb9a996b000+1cd000] likely on CPU 1 (core 1, socket 1)
[99577.163773] Code: 00 00 0f 18 8e c0 20 00 00 0f 18 8e 80 30 00 00 0f 18 8e c0 30 00 00 c5 fe 6f 06 c5 fe 6f 4e 20 c5 fe 6f 56 40 c5 fe 6f 5e 60 <c5> fe 6f a6 00 10 00 00 c5 fe 6f ae 20 10 00 00 c5 fe 6f b6 40 10
Comment 54 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-26 05:21:07 UTC
(In reply to Koichiro Iwao from comment #53)
> Host: FreeBSD 14-STABLE w/ patch comment #14 on AMD Ryzen 7 5700G

I meant comment #41 here.
Comment 55 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-26 05:59:36 UTC
Created attachment 256146 [details]
almalinux-8-patched-bhyve-ldso-list-diagnostics.txt

Just in case it might be useful, I attach the result of ld.so --list-diagnostics.
Comment 56 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-26 08:16:52 UTC
(In reply to Koichiro Iwao from comment #55)
So can you disassemble the function around the faulted address from libc.so,
please?
Comment 57 Konstantin Belousov freebsd_committer freebsd_triage 2024-12-26 08:18:12 UTC
(In reply to Konstantin Belousov from comment #56)
Also, as a blind guess, try to revert this chunk
-				if (width < 0x4)
-					width = 0;
+//				if (width < 0x4)
+//					width = 0;
and see.
Comment 58 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-26 08:23:29 UTC
(In reply to Konstantin Belousov from comment #57
)I'm not familiar with this low-layer area, so I need to know how to disassemble. Anyway, I'll try reverting the width stuff first.
Comment 59 Koichiro Iwao freebsd_committer freebsd_triage 2024-12-26 08:55:13 UTC
(In reply to Konstantin Belousov from comment #57)
I re-added the width stuff didn't help.

$ podman run -it --rm quay.io/almalinuxorg/almalinux:8 /bin/bash
[   16.234882] bash[1001]: segfault at 7ffd851bbd80 ip 00007fd58d40fd10 sp 00007ffd851b9d38 error 4 in libc-2.28.so[7fd58d368000+1cd000] likely on CPU 0 (core 0, socket 0)
[   16.234892] Code: c5 fe 6f 56 40 c5 fe 6f 5e 60 c5 fe 6f a6 00 10 00 00 c5 fe 6f ae 20 10 00 00 c5 fe 6f b6 40 10 00 00 c5 fe 6f be 60 10 00 00 <c5> 7e 6f 86 00 20 00 00 c5 7e 6f 8e 20 20 00 00 c5 7e 6f 96 40 20
Comment 60 Darren Henderson 2025-02-02 18:58:52 UTC
Giving this a bump... haven't seen any movement for a while now. This strikes me as being pretty critical.

It is now possible to upgrade from Almalinux 9.4 to 9.5 - I gather they reverted the lib change. The same is not true for Rocky etc. 

You can not do a fresh install of 9.5 (which makes sense since the release images are the same as they were in December).
Comment 61 Adhemerval Zanella 2025-02-02 19:07:21 UTC
It has been fixed on glibc [1], although it seems that bhyve still sets the L3 cache size to bogus value (which might impact in perfomance, since it influences in which string optimization will be selected at runtime).

So either backport this glibc fix to the affected distros or fix the bhyve L3 cache size report (which should fix boot on the affected distros).

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=32470
Comment 62 Konstantin Belousov freebsd_committer freebsd_triage 2025-02-02 20:12:09 UTC
(In reply to Adhemerval Zanella from comment #61)
Somebody needs to help debug the Alma Linux crash report for the supposed fix.
Comment 63 Florian Weimer 2025-02-02 20:29:18 UTC
(In reply to Konstantin Belousov from comment #62)

The command used was:

> podman run -it --rm quay.io/almalinuxorg/almalinux:8 /bin/bash

This is not going to work until the container image is updated. That hasn't happened. Same issue for quay.io/almalinuxorg/almalinux:9.

I do not see an a8 branch here: https://git.almalinux.org/rpms/glibc
So I'm not sure if there's even a forked glibc package that contains the revert for AlmaLinux 8.

So it doesn't look like there is anything mysterious about the reported failure.
Comment 64 Konstantin Belousov freebsd_committer freebsd_triage 2025-02-03 00:07:06 UTC
(In reply to Florian Weimer from comment #63)
I believe that the Alma issue is different.  It is only for the patched bhyve.
It is probably something that the old glibc wants from CPUID.
Comment 65 Koichiro Iwao freebsd_committer freebsd_triage 2025-02-26 03:10:35 UTC
(In reply to Florian Weimer from comment #63)

Let me sort out the remaining issues. 

1TB L3 cache issue:
- Addressed in glibc upstream
  - AlmaLinux 9.5 and Kitten 10 already include the upstream patch
- Addressed also on bhyve side (temporary patch)

However, the temporary bhyve patch broke glibc on AlmaLinux 8 / EL8 (glibc-2.28-251.el8_10.13). So the temporary patch for bhyve might have regression. This regression prevents the bhyve patch from being merged. 

The bhyve patch means here is the following:

--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -152,6 +152,8 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                                 * pkg_id_shift and other OSes may rely on it.
                                 */
                                width = MIN(0xF, log2(threads * cores));
+                               if (width < 0x4)
+                                       width = 0;
                                logical_cpus = MIN(0xFF, threads * cores - 1);
                                regs[2] = (width << AMDID_COREID_SIZE_SHIFT) | logical_cpus;

To reproduce the EL8 glibc issue, run the following command on PATCHED bhyve environment.
 
$ podman run -it --rm quay.io/almalinuxorg/almalinux:8 /bin/bash
$ podman run -it --rm quay.io/rockylinux/rockylinux:8 /bin/bash

AlmaLinux doesn't have a8 branch for glibc package, so there are no AlmaLinux-specific patches. AlmaLinux's glibc and Rocky Linux's one are built from the same source. 

So what we need to do is, 
- Improve bhyve patch not to break EL8 glibc but fix EL9 glibc issue (1TB L3 cache issue)
- Fix EL8 glibc issue in upstream if it is an upstream issue

I will give you access for PATCHED bhyve environment if necessary. Send me your SSH public key.
Comment 66 Adhemerval Zanella 2025-02-27 12:30:23 UTC
(In reply to Koichiro Iwao from comment #65)

> - Fix EL8 glibc issue in upstream if it is an upstream issue

If I understand correctly, both AlmaLinux 9.5 and Kitten 10 works correctly now after glibc upstream fixed the issue and it was backported, right? Do they work with and without the bhyve workaround? I am trying to understand whether we still have a upstream (which I get that this is current developement branch) issue or if this is indeed fixed.
Comment 67 Koichiro Iwao freebsd_committer freebsd_triage 2025-02-28 00:26:30 UTC
(In reply to Adhemerval Zanella from comment #66)
> If I understand correctly, both AlmaLinux 9.5 and Kitten 10 works correctly now after glibc upstream fixed the issue and it was backported, right? Do they work with and without the bhyve workaround? 

Yes, that's all correct. You might need to install it using AlmaLinux 9.4 installer ISO and update it to the latest 9.5 packages to avoid using the affected gilbc. Otherwise, use the latest 9.5 GenericCloud image.
Comment 68 Koichiro Iwao freebsd_committer freebsd_triage 2025-02-28 04:29:19 UTC
(In reply to Koichiro Iwao from comment #65)
This wasn't correct.

> The bhyve patch means here is the following:
> 
>--- a/sys/amd64/vmm/x86.c
>+++ b/sys/amd64/vmm/x86.c
>@@ -152,6 +152,8 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
>                                * pkg_id_shift and other OSes may rely on it.
>                                */
>                                width = MIN(0xF, log2(threads * cores));
>+                               if (width < 0x4)
>+                                       width = 0;
>                                logical_cpus = MIN(0xFF, threads * cores - 1);
>                                regs[2] = (width << AMDID_COREID_SIZE_SHIFT) | logical_cpus;

This is the correct patch I meant. 

--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -258,7 +256,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                                func = 3;       /* unified cache */
                                break;
                        default:
-                               logical_cpus = 0;
+                               logical_cpus = sockets * threads * cores;
                                level = 0;
                                func = 0;
                                break;
@@ -268,7 +266,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                        regs[0] = (logical_cpus << 14) | (1 << 8) |
                            (level << 5) | func;
                        regs[1] = (func > 0) ? (CACHE_LINE_SIZE - 1) : 0;
-                       regs[2] = 0;
+                       regs[2] = 1;    /* Num of cache ways */
                        regs[3] = 0;
                        break;
Comment 69 Mark Peek freebsd_committer freebsd_triage 2025-03-04 16:46:57 UTC
Having just received an AMD 7840U I wanted to do a little more research into this bug and the current patch. Given the cache values I am seeing I believe the patch needs a small change.

Looking at the cache output from ld.so --list-diagnostics without the patch, i.e., the current code:
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x1000000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x100000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x1000000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

As Florian states in #36, the L3 cache size reporting 1TB is what triggers the bug in glibc-2.40 (or a patched 2.39).

Applying the patch from https://reviews.freebsd.org/D48187 gives these cache values:
x86.cpu_features.data_cache_size=0x80
x86.cpu_features.shared_cache_size=0x2
x86.cpu_features.level1_icache_size=0x80
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x80
x86.cpu_features.level1_dcache_assoc=0x1
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x80
x86.cpu_features.level2_cache_assoc=0x1
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x2
x86.cpu_features.level3_cache_assoc=0x1
x86.cpu_features.level3_cache_linesize=0x1
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

While the guest apps now work, the cache sizes are too small and not realistic. This is due to cpuid 0x8000001D not being fully implemented.

Looking at the glibc code:
     https://github.com/bminor/glibc/blob/glibc-2.40/sysdeps/x86/dl-cacheinfo.h#L309

As Florian talks about in #39, the handle_amd() function first looks at cpuid 0x8000001D for the cache information which is not providing all of the parameters needed to compute the correct cache sizes.  If 0x8000001D is not available or the returned ecx==0, it falls back to a legacy mechanism. But for Zen architecture will also look at the 0x8000001D eax for the NumSharingCache.

To get this fallback to work properly I reverted one of the changes in the proposed patch from Konstantin <https://reviews.freebsd.org/D48187> and only used:
--- a/sys/amd64/vmm/x86.c
+++ b/sys/amd64/vmm/x86.c
@@ -150,8 +150,6 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                                 * pkg_id_shift and other OSes may rely on it.
                                 */
                                width = MIN(0xF, log2(threads * cores));
-                               if (width < 0x4)
-                                       width = 0;
                                logical_cpus = MIN(0xFF, threads * cores - 1);
                                regs[2] = (width << AMDID_COREID_SIZE_SHIFT) | logical_cpus;
                        }
@@ -256,7 +254,7 @@ x86_emulate_cpuid(struct vcpu *vcpu, uint64_t *rax, uint64_t *rbx,
                                func = 3;       /* unified cache */
                                break;
                        default:
-                               logical_cpus = 0;
+                               logical_cpus = sockets * threads * cores;
                                level = 0;
                                func = 0;
                                break;

The reverted change will keep 0x8000001D ecx==0 to prevent 0x8000001D use in handle_amd() while still setting a better value for NumSharingCache for use in the legacy code path. The reported cache sizes with this change shows:

x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x2000000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x100000
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x2000000
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4

These values look more reasonable and fixes the guest issues on my system.

Would like see if this matches what other people are seeing for cache sizes with this patch and if it resolves any outstanding issues.
Comment 70 Konstantin Belousov freebsd_committer freebsd_triage 2025-03-05 00:47:15 UTC
(In reply to Mark Peek from comment #69)
Thank you for the analysis.  I realized that it is just a bug in the patch.
The intent was to set the number of cache ways to 1, but I ignored the 'number
of ways is the value returned plus one' part of the spec.

I updated the patch, basically with the yours revert, and added the comment
explaining the intent.
Comment 71 Koichiro Iwao freebsd_committer freebsd_triage 2025-03-05 08:50:41 UTC
(In reply to Mark Peek from comment #69)
With your patch, EL8 glibc no longer crashes. It looks good to me as for this issue. However, it still reports wrong L3 cache size.

My processor is AMD Ryzen 7 5700G (8-core/16-threads) so it has the following cache sizes.
L1: 64KiB (32KiB instruction + 32KiB data/core)
L2: 4MiB (512KiB/core)
L3: 16MiB
https://www.techpowerup.com/cpu-specs/ryzen-7-5700g.c2472

(AlmaLinux 9 on bhyve with D48187 patch)
$ ld.so --list-diagnostics | grep cache
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x2000000
x86.cpu_features.level1_icache_size=0x8000    # 32768 -> 32KiB
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000    # 32768 -> 32KiB
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x80000    # 524288 -> 512KiB
x86.cpu_features.level2_cache_assoc=0x8
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x2000000  # 33554432 -> 32MiB <= WRONG!
x86.cpu_features.level3_cache_assoc=0x0
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0x0
x86.cpu_features.cachesize_non_temporal_divisor=0x4
Comment 72 Konstantin Belousov freebsd_committer freebsd_triage 2025-03-05 08:56:42 UTC
(In reply to Koichiro Iwao from comment #71)
In what sense this is wrong?
x86.cpu_features.level3_cache_size=0x2000000  # 33554432 -> 32MiB <= WRONG!

32MB is relatively reasonable number.  What misbehavior do you see?
Comment 73 Koichiro Iwao freebsd_committer freebsd_triage 2025-03-05 09:39:19 UTC
(In reply to Konstantin Belousov from comment #72)
> In what sense this is wrong?
> 32MB is relatively reasonable number.  What misbehavior do you see?

The processor actually has only 16MB L3 cache but  bhyve reports the doubled value. Maybe you meant that it's normal for the cache size not to be the same as the physical CPU? Sorry about that if it is absolutely normal.

On my other hardware which has Intel CPU, it reports exactly the same cache sizes with physical CPU. That's the reason why I considered it is still wrong. 

(AlmaLinux 9 on vanilla bhyve 14.2-RELEASE on Intel Celeron N5105)
$ ld.so --list-diagnostics | grep cache
x86.cpu_features.data_cache_size=0x8000
x86.cpu_features.shared_cache_size=0x400000
x86.cpu_features.level1_icache_size=0x8000
x86.cpu_features.level1_icache_linesize=0x40
x86.cpu_features.level1_dcache_size=0x8000
x86.cpu_features.level1_dcache_assoc=0x8
x86.cpu_features.level1_dcache_linesize=0x40
x86.cpu_features.level2_cache_size=0x180000
x86.cpu_features.level2_cache_assoc=0xc
x86.cpu_features.level2_cache_linesize=0x40
x86.cpu_features.level3_cache_size=0x400000
x86.cpu_features.level3_cache_assoc=0x10
x86.cpu_features.level3_cache_linesize=0x40
x86.cpu_features.level4_cache_size=0xffffffffffffffff
x86.cpu_features.cachesize_non_temporal_divisor=0x4
Comment 74 Konstantin Belousov freebsd_committer freebsd_triage 2025-03-05 10:57:13 UTC
(In reply to Koichiro Iwao from comment #73)
Yes, this is normal. We do not aim to report the host values, only something
that makes the guest accept the values.

So no other issues?
Comment 75 Koichiro Iwao freebsd_committer freebsd_triage 2025-03-05 11:40:25 UTC
(In reply to Konstantin Belousov from comment #74)
Yes, no issues as far as I tested.
Comment 76 commit-hook freebsd_committer freebsd_triage 2025-03-05 12:29:18 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0698ce429f78f548f7eb3e54476fb312109ddd8b

commit 0698ce429f78f548f7eb3e54476fb312109ddd8b
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-12-17 21:09:33 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-03-05 12:27:58 +0000

    bhyve: fix CPUID L3 Cache Size reporting for AMD/SVM

    Adjust leaf 0x8000_001D %ecx 3 on AMD (L3 cache params).
    - Report cache as 1-way associative.  Glibc does not believe that there
      are fully associative L3 caches, ignoring the leaf and falling back to
      legacy way of reading cache params.
    - Do not report 4095 logical CPUs per L3 cache, report the true total
      number of emulated CPUs.  The insanely large value tricked some
      version of glibc to overflow 32bit calculation of the L3 cache size,
      as reported in the PR.

    Also, for leaf 0x8000_0008, do not clip ApicIdSize to zero if less than
    4.  This effectively falls back to legacy.

    PR:     279901
    With the help from:     Florian Weimer <fweimer@redhat.com>
    Reviewed by:    kevans, meta, mp
    Tested by:      meta, mp
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D48187

 sys/amd64/vmm/x86.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)
Comment 77 commit-hook freebsd_committer freebsd_triage 2025-03-12 00:25:54 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f9857ea43a38b74e34ce7f6576ad4e6415413454

commit f9857ea43a38b74e34ce7f6576ad4e6415413454
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-12-17 21:09:33 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2025-03-12 00:25:02 +0000

    bhyve: fix CPUID L3 Cache Size reporting for AMD/SVM

    PR:     279901

    (cherry picked from commit 0698ce429f78f548f7eb3e54476fb312109ddd8b)

 sys/amd64/vmm/x86.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)
Comment 78 Mark Linimon freebsd_committer freebsd_triage 2025-05-11 11:16:19 UTC
^Triage: assign to committer who resolved.