We have the following syzbot report:
The crash happened right after the fuzzer started, at a point where it would have been creating a number of tun interfaces for packet injection. I suspect the following race is possible:
- Thread 1 creates a tun interface, which is automatically added to an interface group. if_addgroup() creates the new interface group and adds the new interface to it. Then it drops locks before generating the group_attach_event event, and is preempted.
- Thread 2 creates a second tun interface. The group is already created so the group_attach_event event handlers do not run.
- Thread 2 assigns an interface address. This runs ifaddr_event event handlers, including pfi_ifaddr_event(), which calls pfi_kkif_update(). pfi_kkif_update() recurses and is called on the new interface's containing groups, including the one that thread 1 is still constructing.
- Thread 2 panics: thread 1 has created the group but the ifg_pf_kif field is uninitialized.
We could modify if_addgroup() to initialize the ifg_pf_kif field to NULL, and modify pfi_kkif_update() to handle a NULL ifg_pf_kif, but I'm not sure that this is correct.
That looks like a plausible issue, yes.
Arguably we should be running the event handler under the IFNET_WLOCK to ensure this sort of order reversal doesn't happen. I'm sure that'd have all sorts of other problems though.
Initialising ifg_pf_kif to NULL and skipping such groups in pfi_kkif_update() would at least avoid the panic, but I suspect it'd leave pf subtly out of sync with the real interface state (and address assignments), which is going to cause very subtle and impossible to debug problems of its own. We'd probably avoid those if we called pfi_kkif_update() from pfi_kkif_attach(), but the last time I touched if groups for pf it took a year for the fallout to settle.