Bug 264590 - assembler generates wrong opcodes of instructions fdiv fdivp fdivr fdivrp fsub fsubp fsubr fsubrp
Summary: assembler generates wrong opcodes of instructions fdiv fdivp fdivr fdivrp fsu...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: gnu (show other bugs)
Version: 11.4-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-toolchain (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-10 09:52 UTC by var
Modified: 2022-06-11 12:06 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description var 2022-06-10 09:52:31 UTC
FreeBSD 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Compiler:  clang6, gcc9, bcc32x.exe (Embarcadero, Windows)

Sample code:
--------------------------------------------
static long double sd(void)
{
   long double y;
   __asm__ ("\n\t"
            "fldpi \n\t"
            "fld1 \n\t"
            "fdivp \n\t"
            : "=t"(y)
            :
            :
            );        // st(1)=pi, st(0)=1
   return y;
}



int main(void)
{
   printf("\t%.18Lg\n", sd());
   return 0;
}
-------------------------------------------- 

Output:  0.318309886183790672 = 1/pi
But correct is:  pi = pi/1

Content of file.o:
0b+096  c4 10 5d c3 55 48 89 e5  >> d9 eb, d9 e8, de f1 << 5d c3
.                                   ^^^^^  ^^^^^  ^^^^^ 
Opcodes of 'fldpi', 'fld1' and /division/.
There is the wrong opcode 'de f1' for 'fdivrp'!
Not 'de f9' for the written 'fdivp'.
These wrong translations appear on fdivxx and fsubxx.
Comment 1 Li-Wen Hsu freebsd_committer freebsd_triage 2022-06-10 10:46:55 UTC
11.x is no longer being supported. Can you help to check if this is still happening on 13.1 or even -CURRENT? Thanks!
Comment 2 Stefan Eßer freebsd_committer freebsd_triage 2022-06-10 12:47:20 UTC
(In reply to Li-Wen Hsu from comment #1)

Just checked with clang-14 and gcc11 on -CURRENT/amd64: the result is identical to the one reported (0.31...).

Disassembly of sd() with objdump:

00000000002018e0 <sd>:
  2018e0:       55                      push   %rbp
  2018e1:       48 89 e5                mov    %rsp,%rbp
  2018e4:       d9 eb                   fldpi  
  2018e6:       d9 e8                   fld1   
  2018e8:       de f1                   fdivp  %st,%st(1)
  2018ea:       db 7d f0                fstpt  -0x10(%rbp)
  2018ed:       db 6d f0                fldt   -0x10(%rbp)
  2018f0:       5d                      pop    %rbp
  2018f1:       c3                      ret

Intel Architecture Software Developer’s Manual - Volume 2: Instruction Set Reference:

> Opcode   Instruction         Description
> D8 /6    FDIV m32real        Divide ST(0) by m32real and store result in ST(0)
> DC /6    FDIV m64real        Divide ST(0) by m64real and store result in ST(0)
> D8 F0+i  FDIV ST(0),ST(i)    Divide ST(0) by ST(i) and store result in ST(0)    
> DC F8+i  FDIV ST(i),ST(0)    Divide ST(i) by ST(0) and store result in ST(i)
> DE F8+i  FDIVP ST(i),ST(0)   Divide ST(i) by ST(0), store result in ST(i), and pop the register stack

The byte sequence "de f1" does not exist in the reference manual, but if it is decoded by the processor, then the disassembled instruction "fdivp  %st,%st(1)" might behave in this (undocumented) way:

> DE F0+i  FDIVP ST(0),ST(i)   Divide ST(0) by ST(i), store result in ST(i), and pop the register stack

And that would explain the result obtained.

Maybe the assembler instruction "fdivp \n\t" is interpreted as if it was "fdivp %st,st(1) \n\t", i.e. with operands reversed from what you'd expect?
Comment 3 var 2022-06-10 13:06:30 UTC
Intel:
FDIV  ST(0), ST(i)  D8 F0+i  Divide ST(0) by ST(i) and store result in ST(0).
FDIV  ST(i), ST(0)  DC F8+i  Divide ST(i) by ST(0) and store result in ST(i).
FDIVP ST(i), ST(0)  DE F8+i  Divide ST(i) by ST(0), store result in ST(i), and pop the register stack.
FDIVP               DE F9    Divide ST(1) by ST(0), store result in ST(1), and pop the register stack.
FDIVRP              DE F1    Divide ST(0) by ST(1), store result in ST(1), and pop the register stack.


AMD:
FDIV  ST(0), ST(i)  D8 F0+i  Replace ST(0) with ST(0)/ST(i).
FDIV  ST(i), ST(0)  DC F8+i  Replace ST(i) with ST(i)/ST(0).
FDIVP ST(i), ST(0)  DE F8+i  Replace ST(i) with ST(i)/ST(0), and pop the x87 register stack.
FDIVP               DE F9    Replace ST(1) with ST(1)/ST(0), and pop the x87 register stack.


My opcodes 'DE F9' and 'DE F1' are correct.
Comment 4 var 2022-06-10 13:19:04 UTC
The assembler exchanges  fdivp <==> fdivrp, fsubp <==> fsubrp

Old code from 1991:
I have therein to change fsubr --> fsub, fdivr --> fdiv
to get _today_ correct behavior.
--------------------------------------------------------
;	sc/24.1.91
	TITLE	acos87
	.386
	.387
	.MODEL small
PUBLIC  _acos87
	.DATA
COMM _deg_87:DWORD
	.DATA?
	.CONST
$radtodeg	DT	57.295779513082320876798
ALIGN 4
	.CODE
_acos87	PROC
	fld  	QWORD PTR [esp+4]
	fld	st
	fmul	st, st
	fld1
	fsubr
	fsqrt
	fdivr
	fld1
	fpatan
	mov	eax, _deg_87
	cmp	eax, 0
	jg	SHORT $deg
	ret
Comment 5 var 2022-06-10 13:41:14 UTC
0000000000000024 <sd>:
  24:   55                      push   %rbp
  25:   48 89 e5                mov    %rsp,%rbp
  28:   d9 eb                   fldpi  
  2a:   d9 e8                   fld1   
  2c:   de f1                   fdivp  %st,%st(1)
  2e:   5d                      pop    %rbp
  2f:   c3                      retq   

The 'objdump' works wrong too.
'de f1' is NOT the opcode of 'fdivp', but of 'fdivrp'!
Comment 6 var 2022-06-10 14:13:02 UTC
Clang -S -masm=att test.c
        #APP
	fldpi
	fld1
	fdivp	%st(1)
	#NO_APP

Clang -S -masm=intel test.c
	#APP
	fldpi
	fld1
	fdivrp	st(1)
	#NO_APP

From _constant_ source:
   __asm__ ("\n\t"
            "fldpi \n\t"
            "fld1 \n\t"
            "fdivp \n\t"
            : "=t"(y)
            :
            :
            );


This wrong behavior is truly powerful...
Comment 7 Mark Millard 2022-06-10 14:20:13 UTC
Very old issue. See, for example:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30117

Part of that has some history on the issue:

Andrew Pinski 2006-12-07 22:40:52 UTC
This is at most a GNU binutils bug.  Please file it with them at http://sourceware.org/bugzilla/ .

Also IIRC fdivp's arguments are swapped in AT&T asm mode because of some historical accident.
See the comment in i386.c:
          /* The SystemV/386 SVR3.2 assembler, and probably all AT&T
             derived assemblers, confusingly reverse the direction of
             the operation for fsub{r} and fdiv{r} when the
             destination register is not st(0).  The Intel assembler
             doesn't have this brain damage.  Read !SYSV386_COMPAT to
             figure out what the hardware really does.  */


Also:
#ifndef SYSV386_COMPAT
/* Set to 1 for compatibility with brain-damaged assemblers.  No-one
   wants to fix the assemblers because that causes incompatibility
   with gcc.  No-one wants to fix gcc because that causes
   incompatibility with assemblers...  You can use the option of
   -DSYSV386_COMPAT=0 if you recompile both gcc and gas this way.  */
#define SYSV386_COMPAT 1
#endif
Comment 8 var 2022-06-11 12:06:40 UTC
# define EXCHANGE  1
#if EXCHANGE > 0
# define fsubp   "fsubrp"
# define fsubrp  "fsubp"
# define fdivp   "fdivrp"
# define fdivrp  "fdivp"
#else
# define fsubp   "fsubp"
# define fsubrp  "fsubrp"
# define fdivp   "fdivp"
# define fdivrp  "fdivrp"
#endif
#undef EXCHANGE


A countermeasure.