Bug 264094 - VNET and DPCPU defined in modules panic on use on arm64 after LLVM 14 import due to new linker relaxation
Summary: VNET and DPCPU defined in modules panic on use on arm64 after LLVM 14 import ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: arm64 Any
: --- Affects Many People
Assignee: freebsd-toolchain (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks: 14.0r
  Show dependency treegraph
 
Reported: 2022-05-20 08:25 UTC by Marek Zarychta
Modified: 2023-08-01 20:34 UTC (History)
11 users (show)

See Also:
koobs: maintainer-feedback? (toolchain)
koobs: mfc-stable13?
koobs: mfc-stable12?


Attachments
stack backtrace (1.17 KB, text/plain)
2022-05-20 08:25 UTC, Marek Zarychta
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Zarychta 2022-05-20 08:25:23 UTC
Created attachment 234045 [details]
stack backtrace

After the upgrade to FreeBSD 14.0-CURRENT #3 main-n255718-f2ab9160844 using previosly working net.inet.tcp.cc.algorithm=htcp introduces panic triggered by sendmail starting up. Reverting to default net.inet.tcp.cc.algorithm=newreno setiing solves the issue.
The affected architecture is ARM64, last working version was FreeBSD 14.0-CURRENT #1 main-n253944-b2cf3c2125b: Thu Mar 24 18:17:21 CET 2022.
Comment 1 Michael Tuexen freebsd_committer freebsd_triage 2022-05-20 09:39:00 UTC
I can reproduce the issue, will look into it.
Comment 2 Michael Tuexen freebsd_committer freebsd_triage 2022-05-22 20:44:19 UTC
The panic happens on arm64, but not amd64. It does happen when using clang14 (most recent version in the main tree), it does not happen when using clang13. I also does not happen using clang14 when forcing htcp_recalc_beta() not to be inlined.

The panic happens when accessing V_htcp_adaptive_backoff in
https://cgit.freebsd.org/src/tree/sys/netinet/cc/cc_htcp.c#n471

I disassembled htcp_recalc_beta() when using clang14 and the function not being inlined. This is the relevant code:

(kgdb) disassemble htcp_recalc_beta
Dump of assembler code for function htcp_recalc_beta:
  0x00000000000113cc <+0>:	stp	x29, x30, [sp, #-16]!
  0x00000000000113d0 <+4>:	mov	x29, sp
  0x00000000000113d4 <+8>:	ldr	x8, [x0]          ; x8 = ccv
  0x00000000000113d8 <+12>:	ldr	x9, [x18]         ; x9 = curthread
  0x00000000000113dc <+16>:	adrp	x10, 0x21000      ; x10 = ???
  0x00000000000113e0 <+20>:	ldr	x9, [x9, #1368]   ; x9 = curthread->td_vnet
  0x00000000000113e4 <+24>:	ldr	x10, [x10, #2168] ; x10 = ???
  0x00000000000113e8 <+28>:	ldr	x9, [x9, #40]     ; x9 = curthread->td_vnet->vnet_data_base
  0x00000000000113ec <+32>:	ldr	w9, [x9, x10]     ; w9 = V_htcp_adaptive_backoff ???
  0x00000000000113f0 <+36>:	cbz	w9, 0x11428 <htcp_recalc_beta+92>

I don't understand the computations in relation to x10, which is the offset used to get the relevant variable.

However, this code works.

Looking at the code generated by clang13 when htcp_recalc_beta() is inlined, one gets:

  0xffff000150610f28 <+212>:	ldr	x10, [x0]                ; x10 = ccv
  0xffff000150610f2c <+216>:	ldr	x11, [x18]               ; x11 = curthread
  0xffff000150610f30 <+220>:	ldr	x11, [x11, #1368]        ; x11 = curthread->td_vnet
  0xffff000150610f34 <+224>:	ldr	x12, [x11, #40]          ; x12 = curthread->td_vnet->vnet_data_base
  0xffff000150610f38 <+228>:	adrp	x11, 0xffff000150621000  ; ???
  0xffff000150610f3c <+232>:	ldr	x11, [x11, #2256]        ; ???
  0xffff000150610f40 <+236>:	ldr	w12, [x12, x11]
  0xffff000150610f44 <+240>:	cbz	w12, 0xffff000150610f7c <htcp_ack_received+296>

It looks similar and it does work.

Now comes the inlined code from clang14:

  0xffff0001016acf28 <+212>:	ldr	x10, [x0]         ; x10 = ccv
  0xffff0001016acf2c <+216>:	ldr	x11, [x18]        ; x11 = curthread
  0xffff0001016acf30 <+220>:	ldr	x12, [x11, #1368] ; x12 = curthread->td_vnet
  0xffff0001016acf34 <+224>:	nop
  0xffff0001016acf38 <+228>:	adr	x11, 0xffff0001016bd520 <vnet_entry_htcp_adaptive_backoff>
  0xffff0001016acf3c <+232>:	ldr	x12, [x12, #40]   ; x12 = curthread->td_vnet->vnet_data_base
==>0xffff0001016acf40 <+236>:	ldr	w12, [x12, x11]
  0xffff0001016acf44 <+240>:	cbz	w12, 0xffff0001016acf7c <htcp_ack_received+296>

I reached out at arm-freebsd@freebsd.org for some help regarding the generated code.
Comment 3 Marek Zarychta 2022-05-22 21:12:05 UTC
Thank you for digging into this. It looks like a clang14 flaw. Please see: bug 264021 and probably also bug 264115.
Comment 4 Jessica Clarke freebsd_committer freebsd_triage 2022-05-22 21:18:37 UTC
This looks like it's caused by https://github.com/llvm/llvm-project/commit/8acc3b4ab0c76b9c2a54182e31a02f90ebb96329 given the loss of a LDR of the ADRP and the insertion of a NOP (ignoring the instruction rescheduling that changes register allocation slightly).
Comment 5 Jessica Clarke freebsd_committer freebsd_triage 2022-05-22 21:19:12 UTC
Well, that composed with https://github.com/llvm/llvm-project/commit/4450a2a23df0e7081ca7fee3ec641774afedc2bc
Comment 6 Jessica Clarke freebsd_committer freebsd_triage 2022-05-22 21:45:15 UTC
To confirm, you are using cc_htcp.ko rather than compiling it into your kernel?
Comment 7 Marek Zarychta 2022-05-22 21:54:53 UTC
(In reply to Jessica Clarke from comment #6)
>To confirm, you are using cc_htcp.ko rather than compiling it into your kernel?

Of course. Good idea. I didn't know it is even possible to build the kernel with cc_htpc support.
Comment 8 Jessica Clarke freebsd_committer freebsd_triage 2022-05-22 21:58:14 UTC
(In reply to Marek Zarychta from comment #7)

In the kernel should work as it's prepared for those kinds of optimisations, but kernel modules should not, as they rely on the indirection persisting and being rewritten on load. Marking the special vnet and dpcpu symbols as weak might be sufficient to stop the relaxations, but I wouldn't like to say for sure, kernel modules get build with weird symbol interposition semantics that have always been asking for trouble.
Comment 9 John F. Carr 2022-05-22 23:11:15 UTC
The problem is introduced in the process of turning the .o into a .ko. Pinpointing the bug requires study of the exact definition of the ARM relocation types, in assembly and in ELF.

If you change -c to -S in the command for building cc_htcp.o you get two consecutive lines of assembly

        adrp    x11, :got:vnet_entry_htcp_adaptive_backoff
        ldr     x11, [x11, :got_lo12:vnet_entry_htcp_adaptive_backoff]

In cc_htcp.o these become (objdump --disassemble --reloc)

 200:	9000000b 	adrp	x11, 0 <htcp_mod_init>
			200: R_AARCH64_ADR_GOT_PAGE	vnet_entry_htcp_adaptive_backoff
 204:	f940016b 	ldr	x11, [x11]
			204: R_AARCH64_LD64_GOT_LO12_NC	vnet_entry_htcp_adaptive_backoff

In cc_htcp.ko these are incorrectly optimized after the linker determines the variable is close to the instruction:

   10f50:	d503201f 	nop
   10f54:	10082f6b 	adr	x11, 21540 <vnet_entry_htcp_adaptive_backoff>

This computes the address of the value instead of loading the value.  The optimization would be correct if there were an adrp+adr pair but it is not correct for adrp+ld.  So is :got_lo12: the wrong prefix, is R_AARCH64_LD64_GOT_LO12_NC the wrong binary relocation code, or did the linker forget to check the opcode of the instruction it rewrote?
Comment 10 Jessica Clarke freebsd_committer freebsd_triage 2022-05-22 23:25:43 UTC
(In reply to John F. Carr from comment #9)

The optimisation is correct for non-preemptible symbols. It's not correct for preemptible symbols (or hacks like vnet/dpcpu that rely on symbol and relocation abuse).
Comment 11 Michael Tuexen freebsd_committer freebsd_triage 2022-05-23 07:14:54 UTC
(In reply to Jessica Clarke from comment #6)
I just tested that the problem does not show up if you build the CC module in the kernel using the CC_HTCP kernel option.
Comment 12 Michael Tuexen freebsd_committer freebsd_triage 2022-05-23 07:16:55 UTC
Resigning as assignee since it is a clang issue, not a networking issue.
Comment 13 Michael Tuexen freebsd_committer freebsd_triage 2022-05-23 21:39:39 UTC
Just another data point: Disabling VNET also works around the problem.
Comment 14 Jessica Clarke freebsd_committer freebsd_triage 2022-05-23 21:40:33 UTC
Yes, it's a problem with how VNET (and DPCPU) are designed
Comment 15 Kubilay Kocak freebsd_committer freebsd_triage 2022-05-24 03:10:09 UTC
(In reply to Michael Tuexen from comment #12)

Any details on that isolation that could help toolchain@ with root causing and resolution Michael?
Comment 16 Michael Tuexen freebsd_committer freebsd_triage 2022-05-24 19:34:24 UTC
(In reply to Kubilay Kocak from comment #15)
I think I gave all information I can provide. I'm not familiar with the internals for clang...
Comment 17 Marek Zarychta 2022-10-30 11:33:14 UTC
Fresh "FreeBSD 14.0-CURRENT #7 main-n258920-f0a15aafcb8: Sun Oct 30 09:44:53 CET 2022 arm64" built with "FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)" is not affected. At least I am no more able to reproduce this, so I am closing the bug.
Comment 18 Marek Zarychta 2022-10-30 11:43:36 UTC
Too few tests were run. The panic is still reproducible.
Comment 19 Mark Johnston freebsd_committer freebsd_triage 2022-12-12 17:32:22 UTC
(In reply to Jessica Clarke from comment #10)
Can you see offhand why the symbol (vnet_entry_htcp_adaptive_backoff) is preemptible in the first place?  For arm64 KLDs, VNET_DEFINE_STATIC is hacked up to omit the static qualifier.  But the linker changes the binding of the symbol:

cc_htcp.o:
73: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    5 vnet_entry_htcp_adaptive_backoff
cc_htcp.ko:
35: 0000000000021520     4 OBJECT  LOCAL  DEFAULT   11 vnet_entry_htcp_adaptive_backoff

and local symbols have hidden visibility by default.  I'm not sure offhand why it's changed.  I guess -Bsymbolic would have that effect, but we don't specify it.
Comment 20 Jessica Clarke freebsd_committer freebsd_triage 2022-12-12 17:38:44 UTC
Isn't that a feature of our EXPORT_SYMS awk kmod_syms.awk | xargs -J% objcopy % dance?
Comment 21 Mark Johnston freebsd_committer freebsd_triage 2022-12-12 20:08:16 UTC
(In reply to Jessica Clarke from comment #20)
Oh, right.  So kmod_syms.awk is defeating the hack in VNET_DEFINE_STATIC.  That can be bandaged easily enough.

I tried implementing your suggestion of adding an explicit offset with each VNET definition.  It's a bit painful due to consumers doing things like:

VNET_DEFINE(LIST_HEAD(, foo), n) = LIST_HEAD_INITIALIZER();

I don't think it's possible to implement VNET_DEFINE this way without either outlawing static initializers (or making them an argument to the macro itself instead of using assignment), or outlawing anonymous types, i.e., the above would have to be:

LIST_HEAD(foo_list, foo);
VNET_DEFINE(struct foo_list, n) = LIST_HEAD_INITIALIZER();

I would go with the second option, but it does require a bunch of consumers to be patched.

I haven't yet compared the generated code with what we had before.
Comment 22 Jessica Clarke freebsd_committer freebsd_triage 2022-12-12 20:36:19 UTC
Using

    __attribute__((used,section("set_vnet_entry"),alias("vnet_entry_"#_n)))
    static _t vnet_alias_##_n;

as a hacky way to forward-declare the thing without the compiler complaining seems to work.
Comment 23 Jessica Clarke freebsd_committer freebsd_triage 2022-12-12 21:01:05 UTC
Or you can do typedef _t vnet_type_#_n; and use that everywhere that follows so you can use a normal tentative definition to forward-declare.
Comment 24 Mark Johnston freebsd_committer freebsd_triage 2022-12-12 22:20:11 UTC
(In reply to Jessica Clarke from comment #23)
I had tried the second suggestion, but that doesn't quite work since "n" can be an array, as in VNET_DEFINE(struct hhook_head *, ipsec_hhh_in[HHOOK_IPSEC_COUNT]).

Of course we could add VNET_DEFINE_ARRAY and so on.  Maybe that's better anyway.  Do you see a way to work around the problem without introducing something like that?
Comment 25 Jessica Clarke freebsd_committer freebsd_triage 2022-12-12 22:29:20 UTC
Oh arrays like that are a non-starter as you wouldn't want vnet_offset_foo[N]... so yeah probably the interface needs reworking, not just the implementation, if you want a "proper" approach.
Comment 26 commit-hook freebsd_committer freebsd_triage 2023-07-23 22:39:10 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=80e4ac2964a11edef456a15b77e43aadeaf273a2

commit 80e4ac2964a11edef456a15b77e43aadeaf273a2
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2023-07-23 13:48:36 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2023-07-23 22:35:04 +0000

    Work around VNET and DPCPU related panics on aarch64

    lld >= 14 and recent GNU ld can relax adrp+add and adrp+ldr
    instructions, which breaks VNET and DPCPU when used in modules.

    Until VNET and DPCPU can be fixed to deal with these relaxed
    instructions, disable linker relaxation for now.

    PR:             264094
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision: https://reviews.freebsd.org/D41156

 sys/conf/kmod.mk | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 27 commit-hook freebsd_committer freebsd_triage 2023-07-26 18:06:50 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f1f27dfa4ad9da53fcdfc6aae644fab83bda4e76

commit f1f27dfa4ad9da53fcdfc6aae644fab83bda4e76
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2023-07-23 13:48:36 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2023-07-26 18:02:37 +0000

    Work around VNET and DPCPU related panics on aarch64

    lld >= 14 and recent GNU ld can relax adrp+add and adrp+ldr
    instructions, which breaks VNET and DPCPU when used in modules.

    Until VNET and DPCPU can be fixed to deal with these relaxed
    instructions, disable linker relaxation for now.

    PR:             264094
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision: https://reviews.freebsd.org/D41156

    (cherry picked from commit 80e4ac2964a11edef456a15b77e43aadeaf273a2)

 sys/conf/kmod.mk | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 28 commit-hook freebsd_committer freebsd_triage 2023-07-26 18:06:53 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=98e7f836e65ee413cd0b383371a6aca0115084ed

commit 98e7f836e65ee413cd0b383371a6aca0115084ed
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2023-07-23 13:48:36 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2023-07-26 18:03:46 +0000

    Work around VNET and DPCPU related panics on aarch64

    lld >= 14 and recent GNU ld can relax adrp+add and adrp+ldr
    instructions, which breaks VNET and DPCPU when used in modules.

    Until VNET and DPCPU can be fixed to deal with these relaxed
    instructions, disable linker relaxation for now.

    PR:             264094
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision: https://reviews.freebsd.org/D41156

    (cherry picked from commit 80e4ac2964a11edef456a15b77e43aadeaf273a2)

 sys/conf/kmod.mk | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 29 Dimitry Andric freebsd_committer freebsd_triage 2023-07-28 11:29:33 UTC
Shall we close this, or do you think an Erratum is necessary? From what I gather, 13.2-RELEASE images would be affected by this bug?
Comment 30 Jessica Clarke freebsd_committer freebsd_triage 2023-07-28 16:32:38 UTC
13.2-R definitely needs an EN and patch, yes
Comment 31 Dimitry Andric freebsd_committer freebsd_triage 2023-07-28 18:44:54 UTC
I have sent an EN request to secteam@, and CC'd most of you on this bug here.
Comment 32 commit-hook freebsd_committer freebsd_triage 2023-08-01 20:05:18 UTC
A commit in branch releng/13.2 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e3e6fc3713229e5075473b9e0e9a66699e8a322b

commit e3e6fc3713229e5075473b9e0e9a66699e8a322b
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2023-07-23 13:48:36 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-08-01 19:50:47 +0000

    Work around VNET and DPCPU related panics on aarch64

    lld >= 14 and recent GNU ld can relax adrp+add and adrp+ldr
    instructions, which breaks VNET and DPCPU when used in modules.

    Until VNET and DPCPU can be fixed to deal with these relaxed
    instructions, disable linker relaxation for now.

    PR:             264094
    Reviewed by:    markj
    MFC after:      3 days
    Differential Revision: https://reviews.freebsd.org/D41156

    (cherry picked from commit 80e4ac2964a11edef456a15b77e43aadeaf273a2)
    (cherry picked from commit f1f27dfa4ad9da53fcdfc6aae644fab83bda4e76)

    Approved by:    so
    Security:       FreeBSD-EN-23:08.vnet

 sys/conf/kmod.mk | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 33 Ed Maste freebsd_committer freebsd_triage 2023-08-01 20:34:09 UTC
errata issued