Tightening up permission on some pages in kernel memory (amd64 only)
The item was proposed as amd64-only at the DevSummit, but we should also apply the work to other architectures as well -- arm64 in particular.
I have removed the amd64 only designation from the summary. I believe jtl@ is on point for the amd64 code, who can we get to be on point for arm64?
Any particular pages in mind?
For wider audience, what is behind this title?
(In reply to Conrad Meyer from comment #3) (In reply to op from comment #4) This PR was created to track an item mentioned during the FreeBSD Developer Summit held with BSDCan 2018; there are no more specific details at present.
(In reply to Conrad Meyer from comment #3) > Any particular pages in mind? (In reply to op from comment #4) > For wider audience, what is behind this title? At BSDCan, they asked for work that people were doing that they thought was "important to get into 12.0". rgrimes@ created PRs to track those items. This is one of those items. I (and others) have been working on trying to tighten up the memory permissions in amd64 to make them more accurately track the actual use. These are the changes that have already been committed, or which I have in mind for 12.0: 1. Kernel text, data, and BSS permissions (see r330539). 2. Direct map permissions (see r316767 [by kib]). 3. Recursive page table mappings (see r330511). 4. Kernel stacks (see r329281 [by cem]). 5. Allocated memory (both malloc and UMA allocated) (see https://reviews.freebsd.org/D15691) 6. Kernel module text, data, and BSS permissions (in progress) (This is not an exhaustive list. I apologize to anyone whose change I missed. I don't intend to minimize the importance of changes not on the above list.) Of the above, 1-4 are amd64-specific. They should probably be replicated on other platforms that support memory permission restrictions and don't already have this kind of protection. Number 5 is machine-indepedent. It looks like number 6 will probably have both machine-specific and machine-independent parts.
Thanks! So a lot of this work is already done, and we have concrete ideas of a few remaining areas that could be improved for 12.
A commit references this bug: Author: jtl Date: Wed Jun 13 17:04:45 UTC 2018 New revision: 335068 URL: https://svnweb.freebsd.org/changeset/base/335068 Log: Make UMA and malloc(9) return non-executable memory in most cases. Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691 Changes: head/share/man/man9/malloc.9 head/share/man/man9/zone.9 head/sys/amd64/amd64/bpf_jit_machdep.c head/sys/i386/i386/bpf_jit_machdep.c head/sys/kern/kern_malloc.c head/sys/kern/subr_vmem.c head/sys/net/bpf_jitter.c head/sys/net/bpf_jitter.h head/sys/sys/malloc.h head/sys/vm/uma.h head/sys/vm/uma_core.c head/sys/vm/vm_extern.h head/sys/vm/vm_init.c head/sys/vm/vm_kern.c head/sys/vm/vm_kern.h head/sys/vm/vm_pagequeue.h
Have we thought about APIs to allow M_EXEC consumers to drop write permissions after they have initialized their executable allocation (i.e., the "W^X" concept)?
Also pipe_map and exec_map by markj in r340205 https://reviews.freebsd.org/D17827
(In reply to Jonathan T. Looney from comment #6) Item 6 was resolved by r353729 and some related revisions. We now also have the vm.pmap.kernel_maps sysctl to audit permissions of kernel mappings. This made it easy to identify and fix a recent issue described in r365796's commit log, for example. (In reply to Conrad Meyer from comment #9) On amd64 this can be done using pmap_change_prot(). pmap_protect() also updates permissions but is really intended for user pmaps. I think the amd64 BPF JIT code could drop PROT_WRITE like it does when running in user mode. I'll add this to my todo list; it requires some thought because the BPF code has to restore original permissions before freeing memory, otherwise the direct map alias will be left with non-default permissions. I think the original goals of the PR have been met though and so it can be closed. Please re-open if you disagree.