Bug 273929 - AArch64 machine-dependent code clobbers X0 in SIGTRAP from capsicum violations
Summary: AArch64 machine-dependent code clobbers X0 in SIGTRAP from capsicum violations
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-18 13:34 UTC by David Chisnall
Modified: 2023-09-19 17:50 UTC (History)
3 users (show)

See Also:


Attachments
Reproducer? (977 bytes, text/plain)
2023-09-18 16:14 UTC, Kyle Evans
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Chisnall freebsd_committer 2023-09-18 13:34:52 UTC
X0 is used as both the first argument register and as the return value.  If SIGTRAP is delivered to a process for a Capsicum violation, the mcontext_t in the signal should contain the arguments so that software can trap and emulate the system calls.  This works on x86[-64] but on AArch64 X0 is overwritten before the system call handler is entered.  When emulating `open`, for example, the signal frame always sees the path as `(char*)94`, which makes this impossible.

It would be nice if this could be fixed before 14.0 or be subject to an EN afterwards, since it makes capsicum trap-and-emulate behaviour unusable currently.
Comment 1 Kyle Evans freebsd_committer 2023-09-18 16:14:25 UTC
Created attachment 244999 [details]
Reproducer?

Minimal reproducer, maybe -- x[0] is already clobbered to ECAPMODE in cpu_set_syscall_retval() by the time we go to setup the mcontext.
Comment 2 David Chisnall freebsd_committer 2023-09-18 17:37:14 UTC
(In reply to Kyle Evans from comment #1)

I don't have a *minimal* reproducer, but I've been porting the Verona sandbox code to AArch64:

https://github.com/microsoft/verona-sandbox/pull/2

This works fine on FreeBSD/amd64, but on AArch64 the argument is clobbered.  I believe x86-64 clobbers the system call register, which is why we put preserve that in si_syscall in the signal.  AArch64 puts the syscall number in x8, which is not clobbered.

I could work around this if the original x0 register were either provided in the siginfo or if it were provided in another caller-save register.  The ECAPMODE value needs to be provided after sigreturn, I presume it's not possible to insert it there?  

Copying x0 over x9 in the syscall enter routine would be fine, I think.
Comment 3 Kyle Evans freebsd_committer 2023-09-19 00:11:35 UTC
(In reply to David Chisnall from comment #2)

> I don't have a *minimal* reproducer, but I've been porting the Verona sandbox code to AArch64:

Right, sorry, I meant that I've attached what I believe to be a minimal reproducer of your report.

> I could work around this if the original x0 register were either provided in the siginfo or if it were provided in another caller-save register.  The ECAPMODE value needs to be provided after sigreturn, I presume it's not possible to insert it there?  
>
> Copying x0 over x9 in the syscall enter routine would be fine, I think.

I can't see any reason off-hand that cpu_fetch_syscall_args() couldn't stash a copy of x0 off in x9 to be used in set_mcontext().

I can't imagine a situation where the error (be it ENOSYS, ECAPMODE) really matters that much, but if it did we could presumably also fence off x10 as flag indicating whether x0 has been set to the return value or not and preserve that in the mcontext.
Comment 4 David Chisnall freebsd_committer 2023-09-19 08:54:01 UTC
(In reply to Kyle Evans from comment #3)

> I can't imagine a situation where the error (be it ENOSYS, ECAPMODE) really matters that much, but if it did we could presumably also fence off x10 as flag indicating whether x0 has been set to the return value or not and preserve that in the mcontext.

Part of the problem is that I haven't managed to merge the SIGCAP patches (help there from someone who understands the signal delivery mechanism would be welcome!) and so currently SIGTRAP is overloaded here to mean both:

 - Here is a signal that the program can catch and handle and use to emulate.
 - Here is something that the debugger can watch for and help when debugging Capsicum violations.

For the second case, x0 needs to contain ENOTCAPABLE / ECAPMODE so that code around it can work well with graceful fallback (e.g. we're failing because of cap mode, try an openat thing instead).  For the first case, we want the original x0 and *may* replace the return value with a success value if we emulate correctly but may also pass it on for the wrapped code to try its own emulation.

The nice thing is that anything in the first category is definitely writing architecture-specific (and os-specific) code and so can look in x9.  I'm not sure when the extra value in x10 is useful.  If x9 is set unconditionally for capsicum-triggered SIGTRAP in newer kernels, you still can't use a value in x10 to detect whether it's happened because, if it hasn't, x10 may still have whatever value you choose as a marker, left over from whatever the caller put in there.
Comment 5 David Chisnall freebsd_committer 2023-09-19 08:55:29 UTC
I'm not sure what the `mc->mc_gpregs.gp_elr` modification is for in the link register.  The PC is set to the instruction after the syscall return in the signal handler and so you shouldn't need to skip any instructions.
Comment 6 Kyle Evans freebsd_committer 2023-09-19 14:20:04 UTC
(In reply to David Chisnall from comment #5)

heh, sorry, yeah- that was a remnant from playing with other signal scenarios from a while ago where elr's not necessarily advanced (e.g., __builtin_trap())
Comment 7 Kyle Evans freebsd_committer 2023-09-19 15:43:03 UTC
(In reply to David Chisnall from comment #4)

re: x10, I was mainly proposing its use within the kernel to decide what it needs to do with x0/x9, not something that an application can rely on. I'm not sure that we have anywhere else useful in the trapframe to smuggle that kind of indication through from beginning to end of syscall handling.
Comment 8 David Chisnall freebsd_committer 2023-09-19 15:59:31 UTC
(In reply to Kyle Evans from comment #7)
I think if syscall entry copies x0 of x9 in the trap frame, then no other code needs to care.  The calling convention for the syscall says that x9 is allowed to be clobbered, so nothing on the caller side may rely on it being stable.  If there isn't a signal, it's just a callee-save register that changed as permitted.  If there is a signal, the signal handler can pick it up.