Bug 258751

Summary: race between pfi_kkif_update() and if_addgroup()
Product: Base System Reporter: Mark Johnston <markj>
Component: kernAssignee: freebsd-pf (Nobody) <pf>
Status: New ---    
Severity: Affects Only Me CC: emaste, kp, pi
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Mark Johnston freebsd_committer freebsd_triage 2021-09-27 19:32:08 UTC
We have the following syzbot report:
https://syzkaller.appspot.com/bug?id=6698f3333928ca407039fd0bbe9238983681936f

The crash happened right after the fuzzer started, at a point where it would have been creating a number of tun interfaces for packet injection.  I suspect the following race is possible:

- Thread 1 creates a tun interface, which is automatically added to an interface group.  if_addgroup() creates the new interface group and adds the new interface to it.  Then it drops locks before generating the group_attach_event event, and is preempted.
- Thread 2 creates a second tun interface.  The group is already created so the group_attach_event event handlers do not run.
- Thread 2 assigns an interface address.  This runs ifaddr_event event handlers, including pfi_ifaddr_event(), which calls pfi_kkif_update().  pfi_kkif_update() recurses and is called on the new interface's containing groups, including the one that thread 1 is still constructing.
- Thread 2 panics: thread 1 has created the group but the ifg_pf_kif field is uninitialized.

We could modify if_addgroup() to initialize the ifg_pf_kif field to NULL, and modify pfi_kkif_update() to handle a NULL ifg_pf_kif, but I'm not sure that this is correct.
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2021-09-29 15:19:59 UTC
That looks like a plausible issue, yes.

Arguably we should be running the event handler under the IFNET_WLOCK to ensure this sort of order reversal doesn't happen. I'm sure that'd have all sorts of other problems though.

Initialising ifg_pf_kif to NULL and skipping such groups in pfi_kkif_update() would at least avoid the panic, but I suspect it'd leave pf subtly out of sync with the real interface state (and address assignments), which is going to cause very subtle and impossible to debug problems of its own. We'd probably avoid those if we called pfi_kkif_update() from pfi_kkif_attach(), but the last time I touched if groups for pf it took a year for the fallout to settle.