Bug 209475

Summary: pf didn't check if enough free RAM for net.pf.states_hashsize
Product: Base System Reporter: Olivier Cochard <olivier>
Component: kernAssignee: freebsd-pf mailing list <pf>
Status: New ---    
Severity: Affects Only Me CC: david.marec, farrokhi, fnoyanisi, kp
Priority: ---    
Version: 10.3-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Patch for the proposed fix.
none
The latest patch with console and logging facility outputs
none
patch that uses mallocarray(9) and printf(9)
none
Kernel Crash Dump
none
patch with mallocarray(9), printf(9) and a single fallback block none

Description Olivier Cochard freebsd_committer 2016-05-12 23:03:40 UTC
I'm playing with benching some pf limits and I've found an easy way to block kldload pf.ko:

We just need to set a net.pf.states_hashsize value too big for the system available RAM.
This table is a static table with each entry consuming about 80 bytes of RAM.
Manual page ask to use power of 2 values, then I've try this big value on my 8GB RAM server:

echo 'net.pf.states_hashsize="67108864"' >>/boot/loader.conf
Notice that 67108864 * 80 should be close to 5 GB of RAM reserved.

Then when I run a "service pf onestart", this start a "kldload pf.ko", but kldload stuck and never give the console back on my 8GB RAM server (we can escape with a Ctrl+Z).
But if pf is enabled in rc.conf, this prevent the system to correctly boot.

A Ctrl+D give this kind of information:

load: 0.02  cmd: kldload 1166 [kmem arena] 180.93r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 182.69r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 182.96r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 183.50r 0.00u 0.00s 0% 1700k

During this status, issuing command "pfctl" will crash the system:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81a311b4
stack pointer           = 0x28:0xfffffe00003aba60
frame pointer           = 0x28:0xfffffe00003ac7f0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1869 (pfctl)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff807f5c70 at kdb_backtrace+0x60
#1 0xffffffff807bc276 at vpanic+0x126
#2 0xffffffff807bc143 at panic+0x43
#3 0xffffffff80b4874b at trap_fatal+0x36b
#4 0xffffffff80b48a4d at trap_pfault+0x2ed
#5 0xffffffff80b480ca at trap+0x47a
#6 0xffffffff80b2e2c2 at calltrap+0x8
#7 0xffffffff80715779 at devfs_ioctl_f+0x139
#8 0xffffffff8081013f at kern_ioctl+0x20f
#9 0xffffffff8080fe8c at sys_ioctl+0x15c
#10 0xffffffff80b490b8 at amd64_syscall+0x3a8
#11 0xffffffff80b2e5ab at Xfast_syscall+0xfb
Uptime: 1m44s
Comment 1 david.marec 2016-05-13 20:27:23 UTC
While  investigating, Olivier has suspected the hash table allocation to be responsible of the loading freeze.

by the fact, the allocation process within  pf_initialize() (sys/netpfil/pf/pf.c) function sets the "M_WAITOK" flag. This way, the initialization process might hang to wait for the requested resources.
Comment 2 Olivier Cochard freebsd_committer 2016-05-13 20:52:47 UTC
If I configure net.pf.states_hashsize to 134,217,728, this mean using 10GB of RAM on my 8GB only server, it will directly panic:
KDB: stack backtrace:
#0 0xffffffff807ecbb0 at kdb_backtrace+0x60
#1 0xffffffff807b4086 at vpanic+0x126
#2 0xffffffff807b3f53 at panic+0x43
#3 0xffffffff809b9b7b at vm_fault_hold+0x1bbb
#4 0xffffffff80b3d10c at trap_pfault+0x19c
#5 0xffffffff80b3c8fa at trap+0x47a
#6 0xffffffff80b22de2 at calltrap+0x8
#7 0xffffffff81811509 at pf_initialize+0x1c9
#8 0xffffffff81826775 at pf_modevent+0x125
#9 0xffffffff8079e0db at module_register_init+0xfb
#10 0xffffffff80793887 at linker_load_module+0xc07
#11 0xffffffff80794cd3 at kern_kldload+0xc3
#12 0xffffffff80794dab at sys_kldload+0x5b
#13 0xffffffff80b3d817 at amd64_syscall+0x2f7
#14 0xffffffff80b230cb at Xfast_syscall+0xfb
Uptime: 19s
Comment 3 fehmi noyan isi 2016-05-29 21:38:17 UTC
Created attachment 170810 [details]
Patch for the proposed fix.

Patch file is created with
$ diff -u pf.c pf.c.fni > pf.c.diff

where pf.c.fni is the modified version of the pf.c
Comment 4 fehmi noyan isi 2016-05-29 21:39:01 UTC
Hi,

In this forum post [1] from David, there is a bit of discussion about this PR (apart from the original question).

Would checking the requested amount of memory by malloc(9) against the available RAM (obtained via sysctl hw.physmem) be a good approach to avoid this problem?

To test this, I setup two identical VMs (in fact, VM2 is the copied & renamed version of VM1 bhyve image) with 512MB of RAM, running FreeBSD-CURRENT, single CPU core.

VM1 has the pf.c that comes with the FreeBSD-CURRENT source, whereas VM2 has a patched version of the file, which performs a sanity check on the requested memory by malloc(9) against the RAM size. If the requested memory is more than the available memory, ph_hashsize is set to PF_HASHSIZ, which 32768.

On both VMs, I set net.pf.states_hashsize to 2147483648 (way more than the RAM on the VM) via /boot/loader.conf. pf(4) is loaded with service(8) command. With this configuration on each VM, VM1 fails to start pf(4), whereas VM2 is successful to load the kernel module with net.pf.states_hashsize set to 32768.

VM1
----------------------------------------------------------
vm1 # uname -a
FreeBSD test-pf 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r297692: Fri Apr  8 03:07:13 UTC 2016  root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
vm1 # cat /boot/loader.conf
net.pf.states_hashsize: 2147483648
vm1 # sysctl hw.realmem
hw.realmem: 536870912
vm1 # kldstat
Id Refs Address  Size  Name
1  3 0xffffffff80200000 1ee2bc0  kernel
vm1 # service pf onestart
Enabling pfKernel page fault with the following non-sleepable locks held:
exclusive rw pf rulesets (pf rulesets) r = 0 (0xffffffff822466e0) locked @ /usr/src/sys/modules/pf/../../netpfil/pf/pf_ioctl.c:2901
stack backtrace:
#0 0xffffffff80a91a90 at witness_debugger+0x70
#1 0xffffffff80a92d77 at witness_warn+0x3d7
#2 0xffffffff80e92817 at trap_pfault+0x57
#3 0xffffffff80e91ea4 at trap+0x284
#4 0xffffffff80e71ea7 at calltrap+0x8
#5 0xffffffff8090b646 at devfs_ioctl_f+0x156
#6 0xffffffff80a97106 at kern_ioctl+0x246
#7 0xffffffff80a96e51 at sys_ioctl+0x171
#8 0xffffffff80e92f6b at amd64_syscall+0x2db
#9 0xffffffff80e7218b at Xfast_syscall+0xfb


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code     = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff8223001f
stack pointer    = 0x28:0xfffffe002b724310
frame pointer    = 0x28:0xfffffe002b724800
code segment     = base rx0, limit 0xfffff, type 0x1b
       = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process     = 635 (pfctl)
[ thread pid 635 tid 100074 ]
Stopped at  pfioctl+0x4ef:  movq  (%rdi),%rsi
db> 


VM2
----------------------------------------------------------
vm2 # uname -a
FreeBSD test-pf 11.0-CURRENT FreeBSD 11.0-CURRENT #2: Sun May 29 12:06:57 NZST 2016  test@test-pf:/usr/obj/usr/src/sys/GENERIC  amd64
vm2 # cat /boot/loader.conf
net.pf.states_hashsize="2147483648"
vm2 # sysctl hw.realmem
hw.realmem: 536870912
vm2 # kldstat
Id Refs Address  Size  Name
1  1 0xffffffff80200000 1ee2bc0  kernel
vm2 # service pf onestart
Enabling pf.
vm2 # kldstat
Id Refs Address  Size  Name
1  3 0xffffffff80200000 1ee2bc0  kernel
2  1 0xffffffff82219000 34c30  pf.ko

[1] https://forums.freebsd.org/threads/56250/
Comment 5 Olivier Cochard freebsd_committer 2016-06-03 07:54:33 UTC
Thanks: Your patch works great!
Could you add a log warning message when the not enough memory condition is triggered ?
Comment 6 fehmi noyan isi 2016-08-15 10:25:00 UTC
Hi,

Somehow, I missed the notification for the latest comment! Sorry for the late reply....

It is possible to add a log warning message by using printf(9) or log(9), but I have noticed that pf_initialize() does things quietly most of the times.

Do you reckon using printf(9) or log(9) would be the proper approach for adding a log warning message for the not enough memory condition?
Comment 7 fehmi noyan isi 2016-11-02 08:09:35 UTC
Created attachment 176410 [details]
The latest patch with console and logging facility outputs

Hi there, 

I setup another VM running FreeBSD 12.0-CURRENT and updated the previous patch to reflect Olivier's suggestion with regards to logging.

I used printf(9) to print an error message similar to this

pf: Invalid states hash table size : 2147483648. Using the default value 32768.

Not mentioned in the original bug report, but value of f_srchashsize also needs to be verified against the available memory as setting net.pf.source_nodes_hashsize tunable to a value larger than malloc(9) can allocate causes the system to crash.

The snippet below was captured with script(1) utility from a FreeBSD 12.0-CURRENT system patched with attached diff file (note that console output is not captured, only tty)

VM3
----------------------------------------------------------
Script started on Wed Nov  2 20:07:09 2016
root@vm3:~ # uname -a
FreeBSD vm3 12.0-CURRENT FreeBSD 12.0-CURRENT #2: Tue Nov  1 20:27:46 NZDT 2016     root@vm3:/usr/obj/usr/src/sys/GENERIC  amd64
root@vm3:~ # cat /boot/loader.conf
net.pf.states_hashsize="2147483648"
root@vm3:~ # sysctl hw.realmem
hw.realmem: 536805376
root@vm3:~ # kldstat
Id Refs Address            Size     Name
 1    3 0xffffffff80200000 1fdc408  kernel
 2    1 0xffffffff82419000 2ac85    vboxguest.ko
root@vm3:~ # service pf onestart
Enabling pf.
root@vm3:~ # kldstat
Id Refs Address            Size     Name
 1    5 0xffffffff80200000 1fdc408  kernel
 2    1 0xffffffff82419000 2ac85    vboxguest.ko
 3    1 0xffffffff82444000 351a7    pf.ko
root@vm3:~ # dmesg | grep pf
pf: Invalid states hash table size : 2147483648. Using the default value 32768.
pf: Invalid source node hash table size : 0. Using the default value 8192.
root@vm3:~ # exit
exit

Script done on Wed Nov  2 20:08:23 2016
----------------------------------------------------------
Comment 8 fehmi noyan isi 2018-01-01 00:48:44 UTC
Hi there,

Just checking whether the last submitted patch is an appropriate solution to the problem and satisfies the requirement set by Olivier in his comment on 2016-06-03.

Is there anything that is needed to be done / updated?

Thanks
Comment 9 Kristof Provost freebsd_committer 2018-01-03 12:48:02 UTC
(In reply to fehmi noyan isi from comment #8)
I don't think this is the correct approach. There are no guarantees that the memory would not be allocated for something else.

I currently don't have the time to look at this in more depth, but do feel free to chase me again in a couple of weeks.
Comment 10 fehmi noyan isi 2018-01-16 01:12:52 UTC
(In reply to Kristof Provost from comment #9)
Hi Kristof,

>>> There are no guarantees that the memory would not be allocated for something else.

If my interpretation of your comment is correct (and if not please correct me), you mean one should perform the memory check against the "free" memory rather than the installed memory, which is hw.physmem in this case.

However, I could not find any sysctl tunables that I can get the correct amount of free memory directly. Looking at top(1) source code, it requires extensive calculations to come up with the amount of wired, free or active memory.

Sticking with hw.physmem was the best option I could think of. Any other suggestions are welcome, so that I can work it out...
Comment 11 Kristof Provost freebsd_committer 2018-01-16 08:43:45 UTC
(In reply to fehmi noyan isi from comment #10)
Yes, but I wouldn't check the free memory.

I've not yet looked at the code, but the original report suggests there are a couple of related issues:

> kldload stuck and never give the console back on my 8GB RAM server
This suggests that pf is doing an allocation with M_WAITOK. This means the malloc() call will keep waiting until enough memory is found to satisfy the request. If that's for more memory than is available it'll never return

> During this status, issuing command "pfctl" will crash the system:
This suggests that there's a race in pf initialisation where the ioctl() handler is already registered before all initialisation is done (i.e. the large allocation is still blocked). That's a bug too, and one that could potentially occur even if the large allocation succeeds (though in a very small time window).

My initial approach would be to find the allocation that blocks startup, presumably where the state hash table is allocated, and remove the M_WAITOK flag. Then the code needs to be changed to ensure that it can handle an allocation failure.
The second step is to audit the initialisation code to find out why ioctls() can be done before the state hash table is allocated.
Comment 12 fehmi noyan isi 2018-01-16 09:23:13 UTC
(In reply to Kristof Provost from comment #11)
For the first issue you mentioned,

I think it is this line blocking the startup

V_pf_srchash = malloc(pf_srchashsize * sizeof(struct pf_srchash),M_PFHASH,M_WAITOK|M_ZERO);

then maybe something like below?

if ((V_pf_srchash = malloc(..., ... ,M_NOWAIT|M_ZERO)) == NULL) {
 /* error handling, cleanup & return */
}
Comment 13 Kristof Provost freebsd_committer 2018-01-16 09:46:07 UTC
(In reply to fehmi noyan isi from comment #12)
Yes, that's probably one of them.

There are a couple of allocations in pf_initialize() and they all use M_WAITOK.

I do see why, because it's non-trivial to cope with allocation failures in that part of the code. It gets called from a VNET_SYSINIT(), so it doesn't have a reasonable way of aborting the initialisation.

As an aside, the allocations should probably be changed to use mallocarray() too, because the multiplication they do (pf_hashsize * sizeof(pf_idhash) for example) might overflow. pf_hashsize can be set by the user (although only root, because it's a sysctl, fortunately), so we must verify it won't overflow before trying to allocate it.
Comment 14 fehmi noyan isi 2018-01-16 10:19:07 UTC
(In reply to Kristof Provost from comment #13)

>> I do see why, because it's non-trivial to cope with allocation failures in that part of the code. It gets called from a VNET_SYSINIT(), so it doesn't have a reasonable way of aborting the initialisation.

So, does this come down to supplying a default value and re-attempting malloc() again?

>> As an aside, the allocations should probably be changed to use mallocarray() too
Yes, that might help to avoid this bug, but I do not think FreeBSD has mallocarray()?
Comment 15 Kristof Provost freebsd_committer 2018-01-16 10:23:54 UTC
(In reply to fehmi noyan isi from comment #14)
> So, does this come down to supplying a default value and re-attempting malloc() again?

I was thinking in that direction as well, yes. It's either that, or not activating pf at all. Running it with a smaller state table might not be ideal, but it's bound to be a lot better than running without firewall at all.
 
> Yes, that might help to avoid this bug, but I do not think FreeBSD has mallocarray()?

It does now. It was added very recently (in head). man 9 mallocarray. It might be worth doing that change in a separate commit.
Comment 16 fehmi noyan isi 2018-01-16 22:20:11 UTC
(In reply to Kristof Provost from comment #15)
>> It does now. It was added very recently (in head). man 9 mallocarray. It might be worth doing that change in a separate commit.

This is news to me! A quick look at the head revealed quite a few commits that replaces malloc(9) with mallocarray(9). That's good...

However, I would say even the first problem you mentioned needs to be split into two as we can have cases where the requested amount of memory does not overflow but exceeds the available memory.

Attempting malloc() (or using mallocarray() in this situation would yield the same result) and checking return value, then in case NULL is returned, re-attempting another allocation does not sound like the best approach that could possibly used here.

How about the two-step process below?

1 ) making sure the requested allocation size does not exceed the system memory or the available free memory- we need to check this

2 ) avoiding overflow - mallocarray(9) would do this

which can be represented in pseudo-code as given below

pf_initiazlize(){
   /* the first check */
   alloc_size = check_if_we_have_enough_memory(pf_hashsize * sizeof(pf_idhash))

   /* it should be safe at this point, but better to be cautious than sad */
   V_pf_srchash = mallocarray(alloc_size, ... ,M_NOWAIT|M_ZERO)
}

size_t check_if_we_have_enough_memory(u_long size) {
   return (size > available_or_free_memory)? PF_HASHSIZ : size;
}

I think the second issue below can be addressed independently from the one I mentioned above.

>This suggests that there's a race in pf initialisation where the ioctl() handler is already registered before all initialisation is done (i.e. the large allocation is still blocked). That's a bug too, and one that could potentially occur even if the large allocation succeeds (though in a very small time window).
Comment 17 Kristof Provost freebsd_committer 2018-01-17 10:46:50 UTC
(In reply to fehmi noyan isi from comment #16)
Yes, absolutely. The potential for integer overflow is a separate issue.

I wouldn't try check_if_we_have_enough_memory() though. Memory allocation is very complicated and it's basically impossible to predict if there'd be enough free memory to satisfy a request without actually doing it. Moreover, it'd always be susceptible to races.

I'd try to allocate the requested size with malloc(size * sizeof(foo), M_NOWAIT). If that fails we can fall back on malloc(DEFAULT_SIZE * sizeof(foo), M_WAITOK). Even that allocation might block forever, but at that point the system is so low on memory that we'll be in all sorts of trouble.
Comment 18 fehmi noyan isi 2018-01-17 19:45:37 UTC
(In reply to Kristof Provost from comment #17)
Sounds a better approach than I suggested, thanks!

As the allocation size can be set by the user, don't you think mallocarray(9) would be a better choice than malloc(9) in this case?

I see that you mentioned using mallocarray(9) could be a separate commit, but once the code is updated to fix a memory allocation issue, it may pay to use the new system call we have. What do you think?
Comment 19 Kristof Provost freebsd_committer 2018-01-17 21:50:44 UTC
(In reply to fehmi noyan isi from comment #18)
Yes, mallocarray() would indeed be better. I'd suggest doing that in a separate commit mostly because mallocarray() currently only exists in head, and this patch may be useful in stable/11 as well.

It's not a hard requirement from my point of view though.
Comment 20 fehmi noyan isi 2018-01-17 21:54:02 UTC
ok, if a patch with malloc(9) could find its way into stable/11, it would pay to use malloc(9).
Comment 21 david.marec 2018-01-22 08:10:56 UTC
mallocarray() is now available for 11-Stable.
https://svnweb.freebsd.org/changeset/base/328210
Comment 22 fehmi noyan isi 2018-01-27 09:39:54 UTC
Hi there. I patched pf_initialize() in head (r328383, which has mallocarray(9)) by implementing a fallback mallocarray(9) call  (as discussed in the comments)

if ((V_pf_idhash =  mallocarray(pf_hashsize, sizeof(pf_idhash), M_PFHASH, M_WAITOK|M_ZERO)) == NULL) {
V_pf_idhash = mallocarray(PH_HASHSIZ, sizeof(pf_idhash), M_PFHASH, M_NOWAIT|M_ZERO)
}

// and all other malloc(9) calls

Although, this change prevents the initialisation process against any overflow issues, the fallback mallocarray(...,M_WAITOK) calls still block in case too much memory is requested. 

>> Memory allocation is very complicated and it's basically impossible to predict if there'd be enough free memory to satisfy a request without actually doing it. Moreover, it'd always be susceptible to races.

Based on Kristof's comment above, shall we assume this is as far as we can go?
Comment 23 Kristof Provost freebsd_committer 2018-01-27 10:25:00 UTC
(In reply to fehmi noyan isi from comment #22)
Yeah, that's probably the best we can do for the allocation. It'd also be good to log that we haven't allocated all of the requested memory.

That's part of the problem. Another part (for another patch) is that the pf ioctl() handler seems to get registered before these allocations are done.
Comment 24 fehmi noyan isi 2018-02-08 12:40:34 UTC
Created attachment 190429 [details]
patch that uses mallocarray(9) and printf(9)

I have been tied by other commitments, so could not send this patch, which I wrote about a a week or so ago.

This patch attempts to allocate requested memory by using mallocarray(9) with NO_WAIT flag and if the call fails, it falls back and makes another mallocarray(9) call. Second call, however, is done by using the default PF_HASHSIZ value and M_WAITOK flag.

Please note, even with this patch, pf.ko still crashes during initialisation phase if you set the value of net.pf.states_hashsize sysctl tunable large enough to consume all physical memory, something like 2147483648 on a machine with 512MB RAM (this check has been opted out during the discussions).

The patch also adds some logging mechanism, by using printf(9), in case any of the mallocarray(9) calls fails.

The patch was generated the pf.c found in 12.0-CURRENT
Comment 25 Kristof Provost freebsd_committer 2018-02-08 13:16:18 UTC
(In reply to fehmi noyan isi from comment #24)
I'd change the error message ('invalid states hashtable size(%lu)', and friends) to say there's not enough memory. It's not so much that the value is incorrect, just that it's more than we have memory for.

What crash are you seeing? Note that mallocarray() will panic if you trigger an integer overflow, which could well happen with 2147483648 on a 32-bit machine.
Comment 26 fehmi noyan isi 2018-02-10 06:58:26 UTC
Created attachment 190476 [details]
Kernel Crash Dump

(In reply to Kristof Provost from comment #25)

The VM I am running the tests is 64-bit,so I do not think the panic is triggered by mallocarray(9). However, I see that the mtx_init(9) in the for loop causes the crash.

I attach textdump output for your reference....

Here is the problem;

       if ((V_pf_keyhash = mallocarray(pf_hashsize, sizeof(struct pf_keyhash),
            M_PFHASH, M_NOWAIT | M_ZERO)) == NULL){
            V_pf_keyhash = mallocarray(PF_HASHSIZ, sizeof(struct pf_keyhash),
            M_PFHASH, M_WAITOK | M_ZERO);
            printf(...);
        }
        if ((V_pf_idhash = mallocarray(pf_hashsize, sizeof(struct pf_idhash),
            M_PFHASH, M_NOWAIT | M_ZERO)) == NULL){
            V_pf_idhash = mallocarray(PF_HASHSIZ, sizeof(struct pf_idhash),
            M_PFHASH, M_WAITOK | M_ZERO);
            printf(...);
        }
        pf_hashmask = pf_hashsize - 1; // pf_hashsize is 2147483648
        for (i = 0, kh = V_pf_keyhash, ih = V_pf_idhash; i <= pf_hashmask;
            i++, kh++, ih++) {
                mtx_init(&kh->lock, "pf_keyhash", NULL, MTX_DEF | MTX_DUPOK);
                mtx_init(&ih->lock, "pf_idhash", NULL, MTX_DEF);
        }

In the code above, V_ph_idhash and V_pf_keyhash are allocated PF_HASHSIZ * sizeof(struct pf_keyhash) and PF_HASHSIZ * sizeof(struct pf_idhash) amount of memory respectively. 

The for loop following the mallocarray(9) calls expects the allocated memory to be aligned with pf_hashsize variable, which is usd in the loop and set to 2147483648 in our example. On the other hand, PH_HASHSIZ is 32768. This mismatch causes the initialisation to fail.

Apparently, the value of pf_hashsize needs to be set and it should be used in mallocarray(9) calls rather than PF_HASHSIZ. 

Although, sizeof(struct pf_keyhash) = sizeof(struct pf_idhash) = 40, we cannot guarantee that the size of structs will stay the same (please correct me if I am wrong). 

Given that the for loop assumes V_ph_idhash and V_pf_keyhash are allocated memory by using the same multiplier, which is pf_hashsize, I think we either

 * should make a test before the mallocarray() calls and set pf_hashsize accordingly (how?)
 * make two mallocarray(...,M_NOWAIT) calls and test return values in a single if() statement. If either or both of these pointers is NULL, we should fallback and re-allocate memory for _both_  V_ph_idhash and V_pf_keyhash by using a single pf_hashsize value.
Comment 27 Kristof Provost freebsd_committer 2018-02-11 10:26:15 UTC
(In reply to fehmi noyan isi from comment #26)
Yes, your analysis looks to be correct. I'd go for the second option: try to allocate both keyhash and idhash with the requested size. If either one fails free both and re-try with the default size.

I don't think the sizes of the structs are relevant here. We allocate 'pf_hashsize' elements of both, but we don't care how much memory each allocation takes, just that we know how many there are.
Comment 28 fehmi noyan isi 2018-02-13 11:54:04 UTC
Created attachment 190574 [details]
patch with mallocarray(9), printf(9) and a single fallback block

Attached is the latest patch with a single fallback block which is executed depending on the return values from mallcoarray(... | M_NOWAIT) calls. The code sets the value of respective variable that is passed to mallocarray(9) rather than directly using PH_HASHSIZ in the mallocarray(9) call.

I tried the patch on a 512MB VM running FreeBSD-12.0CURRENT and verified that PF initialisation went okay. A log message indicating the memory allocation issue is present in /var/log/messages (and in the console).

root@test-vm:~ # uname -a

FreeBSD test-vm 12.0-CURRENT FreeBSD 12.0-CURRENT #13: Wed Feb 14 13:28:52 NZDT 2018     root@test-vm:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
root@test-vm:~ # cat /boot/loader.conf
net.pf.states_hashsize="2147483648"
root@test-vm:~ # sysctl hw.realmem
hw.realmem: 536805376
root@test-vm:~ # kldstat
Id Refs Address            Size     Name
 1    1 0xffffffff80200000 20bf2d0  kernel
root@test-vm:~ # service pf onestart
Enabling pf.
root@test-vm:~ # kldstat
Id Refs Address            Size     Name
 1    3 0xffffffff80200000 20bf2d0  kernel
 2    1 0xffffffff82419000 33590    pf.ko
root@test-vm:~ # grep pf_initialize /var/log/messages
Feb 14 13:40:20 test-vm kernel: pf_initialize : Not enough memory for 85899345920 bytes.
Comment 29 Kristof Provost freebsd_committer 2018-02-13 12:03:59 UTC
(In reply to fehmi noyan isi from comment #28)
It's possible that one of the V_pf_keyhash or V_pf_idhash allocations succeeded, but not the other. That means you may have to free one of them.

(Note that free(NULL, M_PFHASH) is safe.)

It may be easier to discuss the patch if you post it on http://reviews.freebsd.org.
Comment 30 fehmi noyan isi 2018-02-14 01:43:47 UTC
(In reply to Kristof Provost from comment #29)

Calling free(9) before reallocating the memory was a no brainer. Thanks!

Below is the diff
https://reviews.freebsd.org/D14367
Comment 31 commit-hook freebsd_committer 2018-02-25 08:57:23 UTC
A commit references this bug:

Author: kp
Date: Sun Feb 25 08:56:44 UTC 2018
New revision: 329950
URL: https://svnweb.freebsd.org/changeset/base/329950

Log:
  pf: Cope with overly large net.pf.states_hashsize

  If the user configures a states_hashsize or source_nodes_hashsize value we may
  not have enough memory to allocate this. This used to lock up pf, because these
  allocations used M_WAITOK.

  Cope with this by attempting the allocation with M_NOWAIT and falling back to
  the default sizes (with M_WAITOK) if these fail.

  PR:		209475
  Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>
  MFC after:	3 weeks
  Differential Revision:	https://reviews.freebsd.org/D14367

Changes:
  head/sys/net/pfvar.h
  head/sys/netpfil/pf/pf.c
Comment 32 fehmi noyan isi 2018-03-03 11:30:01 UTC
Thank you!
Comment 33 commit-hook freebsd_committer 2018-03-18 11:25:58 UTC
A commit references this bug:

Author: kp
Date: Sun Mar 18 11:25:40 UTC 2018
New revision: 331116
URL: https://svnweb.freebsd.org/changeset/base/331116

Log:
  MFC r329950:

  pf: Cope with overly large net.pf.states_hashsize

  If the user configures a states_hashsize or source_nodes_hashsize value we may
  not have enough memory to allocate this. This used to lock up pf, because these
  allocations used M_WAITOK.

  Cope with this by attempting the allocation with M_NOWAIT and falling back to
  the default sizes (with M_WAITOK) if these fail.

  PR:		209475
  Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>

Changes:
_U  stable/11/
  stable/11/sys/net/pfvar.h
  stable/11/sys/netpfil/pf/pf.c
Comment 34 commit-hook freebsd_committer 2018-03-18 11:27:02 UTC
A commit references this bug:

Author: kp
Date: Sun Mar 18 11:26:07 UTC 2018
New revision: 331117
URL: https://svnweb.freebsd.org/changeset/base/331117

Log:
  MFC r329950:

  pf: Cope with overly large net.pf.states_hashsize

  If the user configures a states_hashsize or source_nodes_hashsize value we may
  not have enough memory to allocate this. This used to lock up pf, because these
  allocations used M_WAITOK.

  Cope with this by attempting the allocation with M_NOWAIT and falling back to
  the default sizes (with M_WAITOK) if these fail.

  PR:		209475
  Submitted by:	Fehmi Noyan Isi <fnoyanisi AT yahoo.com>

Changes:
_U  stable/10/
  stable/10/sys/net/pfvar.h
  stable/10/sys/netpfil/pf/pf.c