Bug 275322 - Improper handling of mxcsr register during debug (gdb/lldb)
Summary: Improper handling of mxcsr register during debug (gdb/lldb)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 14.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Konstantin Belousov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-25 03:42 UTC by Cheyenne Wills
Modified: 2024-11-25 23:44 UTC (History)
7 users (show)

See Also:


Attachments
C code to demonstrate the problem (940 bytes, text/plain)
2023-11-25 03:42 UTC, Cheyenne Wills
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Cheyenne Wills 2023-11-25 03:42:00 UTC
Created attachment 246554 [details]
C code to demonstrate the problem

There is improper handling of the mxcsr register when debugging.  It appears that the current mxcsr register is not being given to the debugger.

The attached program illustrates the problem.

compile the attached program:lang -mfpmath=sse -mlong-double-64 -g -lm -o {x} {x.c}

Run the following debug session:

gdb ./{x}
break main
run
** program should run to end with the following output:

ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =0 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00001f80

disassemble main
** set a break at the mulsd instruction (should be around offset +81)
run
** should now be at the mulsd instruction
continue
** output should be the same as above

run
** should now be at the mulsd instruction
info register mxcsr
** output should be:
mxcsr          0x1f80              [ IM DM ZM OM UM PM ]
** which is incorrect since it there was a ldmxcsr instruction around offset +39 and the result should be:
mxcsr          0x9fc0              [ DAZ IM DM ZM OM UM PM FZ ]

continue
** output shows incorrect output:
ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =7.01209994486364403e-300 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00001fa2
** since it appears that the mxcsr register is being reset back to the 0x1f80 value after it was displayed
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2023-11-25 03:52:56 UTC
jhb was already discussing some bits leading up to this on IRC a bit, but maybe it's a ptrace thing that kib would be interested in- cc'ing both
Comment 2 Cheyenne Wills 2023-11-25 04:52:27 UTC
(In reply to Kyle Evans from comment #1)

I might be able to reduce the C code even more if needed.

Some background on how I stumbled on this.

I'm trying to debug some of my code (assembly) that uses SSE2 floating point instructions.  The code runs fine under Linux, and I wanted to see if it could be ported to freebsd.  The code ended up in a loop where it is trying to scale a floating point number, and when I was trying to debug the code I was getting inconsistent results when I was running the program under the gdb debugger.

The assembly source does set mxcsr to control how underflow/overflow, etc, are handled within the assembly code.  The mxcsr settings are relied upon within the assembly code and the mxcsr register is reset back to value that it was when the assembly code got control from the calling C code.  The mxcsr register is set back to the "original" value whenever the assembly code switches back to any C functions, or other system calls.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2023-11-25 06:35:25 UTC
I cannot reproduce this.  For me it looks like this:
Breakpoint 1, main (argc=1, argv=0x7fffffffe5f8) at pr275322.c:14
14      in pr275322.c
(gdb) c
Continuing.
pid 99860 comm pr275322: signal 5 err 0 code 2 type 10 addr 0x201703 rsp 0x7fffffffe540 rip 0x201703 rax 0<8b 45 ec 89 04 25 30 3a>
pid 99860 comm pr275322: signal 5 err 0 code 1 type 3 addr 0x201742 rsp 0x7fffffffe540 rip 0x201742 rax 0x9fc0<44 0f 59 e0 f2 44 0f 11>

Breakpoint 2, 0x0000000000201741 in main (argc=1, argv=0x7fffffffe5f8)
    at pr275322.c:18
18      in pr275322.c
(gdb) info registers mxcsr
mxcsr          0x9fc0              [ DAZ IM DM ZM OM UM PM FZ ]
(gdb) c
Continuing.
pid 99860 comm pr275322: signal 5 err 0 code 2 type 10 addr 0x201746 rsp 0x7fffffffe540 rip 0x201746 rax 0x9fc0<f2 44 0f 11 24 25 40 3a>
ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =0 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00009fc0
[Inferior 1 (process 99860) exited normally]

I tried this on two machines, one being KabyLake, and another Goldmont.

Are you trying this on bare metal boot?  What is the CPU where the problem
occurs?
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2023-11-25 06:36:10 UTC
[Both machines are on several days old stable/14]
Comment 5 Cheyenne Wills 2023-11-26 01:07:22 UTC
(In reply to Konstantin Belousov from comment #3)

This is failing on an AMD Ryzen 9 5900X 12-Core Processor

    CPU family:          25
    Model:               33

Flags:               fpu de tsc msr pae mce cx8 apic mca cmov pat clflush mm
                         x fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm
                          constant_tsc rep_good nopl nonstop_tsc cpuid extd_apic
                         id tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 s
                         se4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor
                          lahf_lm cmp_legacy abm sse4a misalignsse 3dnowprefetch
                          bpext ibpb vmmcall fsgsbase bmi1 avx2 bmi2 erms rdseed
                          adx clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clz
                         ero xsaveerptr arat vaes vpclmulqdq rdpid fsrm

Running xen-4.17.3 with a gentoo DOM0 

Running the same test using a different guest (Linux) did not show the same failure.

I copied the freebsd disk image over to a different system (running kvm on a Intel Xeon processor) and was unable to reproduce the problem.
Comment 6 Konstantin Belousov freebsd_committer freebsd_triage 2023-11-26 05:55:12 UTC
(In reply to Cheyenne Wills from comment #5)
I think that the mandatory next step is to try this on bare metal Ryzen.
I want to exclude the hypervisor before trying to remember an AMD quirks.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2023-11-28 15:47:55 UTC
When I run this on a Ryzen 7950X3D running main, I get:

ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =0 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00009fc0

when I have a breakpoint on the "mulsd" instruction:

ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =0 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00009fc0

while stopped at "mulsd":

(gdb) info register mxcsr
mxcsr          0x1f80              [ IM DM ZM OM UM PM ]
(gdb) c
Continuing.
ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =7.01209994486364403e-300 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00001fa2

So I believe the problem is reproducible here. In particular, the output of the test program varies depending on whether I print the value of mxcsr from gdb or not.
Comment 8 Olivier Certner freebsd_committer freebsd_triage 2023-11-28 16:53:49 UTC
On a bare-metal Ryzen 3900X running 13-STABLE, I get:

ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =0 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00009fc0

With a breakpoint I see the same, and when stopped at "mulsd":
(gdb) info register mxcsr
mxcsr          0x1f80              [ IM DM ZM OM UM PM ]
(gdb) continue
Continuing.
ra=7.01209994486364354e-310 reatt=10000000000 ra * reatt-> result =7.01209994486364403e-300 savemxcsr 00001f80 mxcsr_set 00009fc0 showmxcsr 00009fc0 mxcsr 00001fa2

So I get the same results as markj@ (comment #7), and the same as Cheyenne Wills (comment #1) except for mxcsr in the first print (00009fc0).  Given the code, I don't get how you can possibly obtain 00001f80.  Anyway, there is a discrepancy between a normal run (and a debugged run without "info register mxcsr") and a debug run with "info register mxcsr").
Comment 9 Cheyenne Wills 2023-12-01 18:14:39 UTC
(In reply to Mark Johnston from comment #7)

Yes -- you were able to reproduce the problem.

Thanks for looking into this.
Comment 10 Konstantin Belousov freebsd_committer freebsd_triage 2024-03-27 11:18:12 UTC
Finally got dev access to an AMD machine.
https://reviews.freebsd.org/D44522
Comment 11 commit-hook freebsd_committer freebsd_triage 2024-03-28 11:57:56 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1c091d11261a3c8cc3728b92760e65242c0f5949

commit 1c091d11261a3c8cc3728b92760e65242c0f5949
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-03-27 11:01:44 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-03-28 11:56:21 +0000

    x86: handle MXCSR from XSAVEOPT when x87 state was optimized

    PR:     275322
    Reported by:    Cheyenne Wills <cheyenne.wills@gmail.com>
    Reviewed by:    emaste, jhb, olce
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D44522

 sys/amd64/amd64/fpu.c | 21 +++++++++++++++++++++
 sys/i386/i386/npx.c   | 21 +++++++++++++++++++++
 2 files changed, 42 insertions(+)
Comment 12 commit-hook freebsd_committer freebsd_triage 2024-04-02 09:00:18 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=38fdb37047ead625b97b82e8f6532ed7bc404ee3

commit 38fdb37047ead625b97b82e8f6532ed7bc404ee3
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-03-27 11:01:44 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-04-02 08:58:20 +0000

    x86: handle MXCSR from XSAVEOPT when x87 state was optimized

    PR:     275322

    (cherry picked from commit 1c091d11261a3c8cc3728b92760e65242c0f5949)

 sys/amd64/amd64/fpu.c | 21 +++++++++++++++++++++
 sys/i386/i386/npx.c   | 21 +++++++++++++++++++++
 2 files changed, 42 insertions(+)
Comment 13 commit-hook freebsd_committer freebsd_triage 2024-04-02 15:18:30 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b79fb02b32d930b1940ad7a21cb17896c848d432

commit b79fb02b32d930b1940ad7a21cb17896c848d432
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2024-03-27 11:01:44 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2024-04-02 15:18:00 +0000

    x86: handle MXCSR from XSAVEOPT when x87 state was optimized

    PR:     275322

    (cherry picked from commit 1c091d11261a3c8cc3728b92760e65242c0f5949)

 sys/amd64/amd64/fpu.c | 21 +++++++++++++++++++++
 sys/i386/i386/npx.c   | 21 +++++++++++++++++++++
 2 files changed, 42 insertions(+)