Bug 232878

Summary: File sealing
Product: Base System Reporter: Simon Ser <contact>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: cem, crest, crest, debdrup, emaste
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Simon Ser 2018-11-01 07:41:04 UTC
File sealing is a Linux-specific safety mechanism that can be used when sharing memory between two processes.

In this scenario, one process typically calls shm_open(SHM_ANON), mmaps the result in its address space, writes interesting things in this slice of memory, sends the file descriptor over a Unix socket to another process. The other process then mmaps the file descriptor to its own address space and reads the shared memory.

Sometimes the two processes don't trust each other, for instance in the case of Wayland. Bad clients may try to crash the compositor.

One way to crash the compositor is to send a shared memory file descriptor and then shrink the file. When the compositor tries to read the now-unmapped part of the file it'll receive SIGBUS.

What the compositor currently does is that it handles SIGBUS and ignores it if it's about a memory slice mmapped from IPC. Apart from being a hack, this makes things complicated because:

* There are multiple Wayland interfaces that need to mmap a file descriptor sent over IPC. Collecting the list of IPC-mmapped regions is currently not possible with libwayland.
* Since SIGBUS is global state, handling it is difficult. Some other IPC mechanisms might need to add more regions to the list. Threads make this even more annoying.

See https://gitlab.freedesktop.org/wayland/wayland/issues/53#note_24663

I'd like to know if there are plans to add a feature similar to file sealing (https://lwn.net/Articles/591108/) in FreeBSD.
Comment 1 Jan Bramkamp 2018-11-01 11:27:45 UTC
If I remeber correctly OpenBSD has a clean solution to this. They added a flag to mmap (MAP_ZERO) that causes reads to the truncated part of the memory mapping to return zeros. I don't know what they do with writes.
Comment 2 Simon Ser 2018-11-01 18:02:25 UTC
Indeed you're right. OpenBSD has __MAP_NOFAULT, which is an extension flag for mmap.

Would you be interested in implementing something like this?
Comment 3 Simon Ser 2018-11-01 19:53:16 UTC
Some more information on the flag from Mark Kettenis:

> The flag is called __MAP_NOFAULT.  It is somewhat deliberately
> undocumented as we didn't want to encourage people too much to use it.
> The flag gets translated into UVM_ET_NOFAULT in OpenBSD's vm
> subsystem.  The idea is that if the pages backing the mapping are not
> available (i.e. if a file got truncated) it will be replaced with
> anonymous zero-filled memory.  In other words, if the mapped data for
> a specific memory page isn't available it would behave as if the page
> was mapped with the MAP_ANON flag.
Comment 4 Daniel Ebdrup Jensen freebsd_committer 2020-07-24 11:08:26 UTC
May I recommend looking at base r 362769 and the commit log [1] ?

[1]: https://freshbsd.org/search?q=seal*&project%5B%5D=freebsd&repository%5B%5D=src&branches%5B%5D=head&sort=commit_date
Comment 5 crest 2020-07-24 15:01:52 UTC
Is this functionality exposed through the FreeBSD syscall ABI or is it locked away behind the Linux ABI?
Comment 6 Conrad Meyer freebsd_committer 2020-07-24 15:13:43 UTC
It is exposed via FreeBSD syscall ABI.  The native ABI is shm_open() / shm_open2().  memfd_create(2) is available in FreeBSD ABI as an alias.
Comment 8 Simon Ser 2021-06-04 14:09:23 UTC
Linux is considering implementing something similar to OpenBSD's __MAP_NOFAULT: https://lore.kernel.org/linux-mm/1622792602-40459-1-git-send-email-mlin@kernel.org/