Summary: | [patch] Save only callee-saved registers in pcb | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Dapeng Gao <peter> | ||||||
Component: | arm | Assignee: | Andrew Turner <Andrew> | ||||||
Status: | In Progress --- | ||||||||
Severity: | Affects Some People | CC: | Andrew, emaste, jfc, marklmi26-fbsd | ||||||
Priority: | --- | Keywords: | patch | ||||||
Version: | CURRENT | Flags: | linimon:
mfc-stable13?
|
||||||
Hardware: | arm64 | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
Dapeng Gao
2022-03-20 20:58:03 UTC
Created attachment 232594 [details]
Patch
NOTE: Please review the attached patch (instead of the one pasted in comment #0) as the pasted patch is incomplete. (In reply to Dapeng Gao from comment #0) Looks to me like some of this code is for context switching, for example. If true, see: https://developer.arm.com/documentation/den0024/a/The-Memory-Management-Unit/Context-switching where, in part, it reports: QUOTE Exactly what has to be saved and restored varies between different operating systems, but typically a process context switch includes saving or restoring some or all of the following elements: general-purpose registers X0-X30. Advanced SIMD and Floating-point registers V0 - V31. Some status registers. TTBR0_EL1 and TTBR0. Thread Process ID (TPIDxxx) Registers. Address Space ID (ASID). . . . END QUOTE (In reply to Mark Millard from comment #3) These code are indeed related to context switching, but they are not expected to preserve all the listed registers. Instead, all changed functions (i.e. cpu_throw, cpu_switch, and savectx) are called by C code as normal C functions, and they only need to obey the C calling convention (i.e. preserving the callee-saved registers). (Also note that none of x0-x7) were preserved in the original code, which confirms the view that caller-saved registers need not be preserved. (In reply to Dapeng Gao from comment #4) Using an example to indicate my limited understanding: static void sched_throw_tail(struct thread *td) { mtx_assert(&sched_lock, MA_OWNED); KASSERT(curthread->td_md.md_spinlock_count == 1, ("invalid count")); cpu_throw(td, choosethread()); /* doesn't return */ } cpu_throw does not return, for example, so niether does sched_throw_tail. sched_throw_tail can not restore any context during its return activity because it never returns. The return context is some place else. Any related restore is via another mechanism. The patch uses ldp and stp with offsets that are odd multiples of 8 bytes. Are there any ARM v8 chips where ldp and stp are faster when accessing an aligned 16 byte block? (In reply to Mark Millard from comment #5) Hmm. I used to know some out the powerpc64 and powerpc context for such things but it is clear that the code is not as I mis-remember it. So either I've mixed up contexts or just plain have forgotten what I once knew. So just ignore my comments (other than quotes from documentation). I did notice that: https://developer.arm.com/documentation/den0024/a/The-ABI-for-ARM-64-bit-Architecture/Register-use-in-the-AArch64-Procedure-Call-Standard/Parameters-in-general-purpose-registers lists "Caller-saved temporary registers (X9-X15)" and x8 is in a category of its own there. x16, x17, and x18 are also special: "Registers with a special purpose (X8, X16-X18, X29, X30)". x18 is for platform ABI use, so its handling is ABI-choice specific from what I can tell. (x18 "is an additional temporary register on platforms that don't assign a special meaning to it". As I understand, FreeBSD has a defined use of x18.) The others were (including x29 again): "Argument registers (X0-X7)" "Callee-saved registers (X19-X29)" Out of this, the handling of x8, x16, x17, and x30 are not explicitly listed as either caller-saved, nor callee-saved, so are more volatile. But traps should not mess them up by the time the code is returned to. (Some types of traps happen at arbitrary points in the execution.) Anyway, sorry for the noise. Created attachment 232611 [details]
Grouping MSR/MRS and LD/ST together
This additional patch groups the MSR/MRS and LD/ST instructions together rather than interleaving them. This helps with processor stalls.
Forgot to tag this in the commit: https://cgit.freebsd.org/src/commit/?id=e605b87a9e75a7f693527f0aad8189ae9db20f16 Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi> |