Bug 209475

Summary: pf didn't check if enough free RAM for net.pf.states_hashsize
Product: Base System Reporter: Olivier Cochard <olivier>
Component: kernAssignee: freebsd-pf
Status: New ---    
Severity: Affects Only Me CC: david.marec, farrokhi, fnoyanisi
Priority: ---    
Version: 10.3-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Patch for the proposed fix.
none
The latest patch with console and logging facility outputs none

Description Olivier Cochard freebsd_committer 2016-05-12 23:03:40 UTC
I'm playing with benching some pf limits and I've found an easy way to block kldload pf.ko:

We just need to set a net.pf.states_hashsize value too big for the system available RAM.
This table is a static table with each entry consuming about 80 bytes of RAM.
Manual page ask to use power of 2 values, then I've try this big value on my 8GB RAM server:

echo 'net.pf.states_hashsize="67108864"' >>/boot/loader.conf
Notice that 67108864 * 80 should be close to 5 GB of RAM reserved.

Then when I run a "service pf onestart", this start a "kldload pf.ko", but kldload stuck and never give the console back on my 8GB RAM server (we can escape with a Ctrl+Z).
But if pf is enabled in rc.conf, this prevent the system to correctly boot.

A Ctrl+D give this kind of information:

load: 0.02  cmd: kldload 1166 [kmem arena] 180.93r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 182.69r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 182.96r 0.00u 0.00s 0% 1700k
load: 0.02  cmd: kldload 1166 [kmem arena] 183.50r 0.00u 0.00s 0% 1700k

During this status, issuing command "pfctl" will crash the system:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81a311b4
stack pointer           = 0x28:0xfffffe00003aba60
frame pointer           = 0x28:0xfffffe00003ac7f0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1869 (pfctl)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff807f5c70 at kdb_backtrace+0x60
#1 0xffffffff807bc276 at vpanic+0x126
#2 0xffffffff807bc143 at panic+0x43
#3 0xffffffff80b4874b at trap_fatal+0x36b
#4 0xffffffff80b48a4d at trap_pfault+0x2ed
#5 0xffffffff80b480ca at trap+0x47a
#6 0xffffffff80b2e2c2 at calltrap+0x8
#7 0xffffffff80715779 at devfs_ioctl_f+0x139
#8 0xffffffff8081013f at kern_ioctl+0x20f
#9 0xffffffff8080fe8c at sys_ioctl+0x15c
#10 0xffffffff80b490b8 at amd64_syscall+0x3a8
#11 0xffffffff80b2e5ab at Xfast_syscall+0xfb
Uptime: 1m44s
Comment 1 david.marec 2016-05-13 20:27:23 UTC
While  investigating, Olivier has suspected the hash table allocation to be responsible of the loading freeze.

by the fact, the allocation process within  pf_initialize() (sys/netpfil/pf/pf.c) function sets the "M_WAITOK" flag. This way, the initialization process might hang to wait for the requested resources.
Comment 2 Olivier Cochard freebsd_committer 2016-05-13 20:52:47 UTC
If I configure net.pf.states_hashsize to 134,217,728, this mean using 10GB of RAM on my 8GB only server, it will directly panic:
KDB: stack backtrace:
#0 0xffffffff807ecbb0 at kdb_backtrace+0x60
#1 0xffffffff807b4086 at vpanic+0x126
#2 0xffffffff807b3f53 at panic+0x43
#3 0xffffffff809b9b7b at vm_fault_hold+0x1bbb
#4 0xffffffff80b3d10c at trap_pfault+0x19c
#5 0xffffffff80b3c8fa at trap+0x47a
#6 0xffffffff80b22de2 at calltrap+0x8
#7 0xffffffff81811509 at pf_initialize+0x1c9
#8 0xffffffff81826775 at pf_modevent+0x125
#9 0xffffffff8079e0db at module_register_init+0xfb
#10 0xffffffff80793887 at linker_load_module+0xc07
#11 0xffffffff80794cd3 at kern_kldload+0xc3
#12 0xffffffff80794dab at sys_kldload+0x5b
#13 0xffffffff80b3d817 at amd64_syscall+0x2f7
#14 0xffffffff80b230cb at Xfast_syscall+0xfb
Uptime: 19s
Comment 3 fehmi noyan isi 2016-05-29 21:38:17 UTC
Created attachment 170810 [details]
Patch for the proposed fix.

Patch file is created with
$ diff -u pf.c pf.c.fni > pf.c.diff

where pf.c.fni is the modified version of the pf.c
Comment 4 fehmi noyan isi 2016-05-29 21:39:01 UTC
Hi,

In this forum post [1] from David, there is a bit of discussion about this PR (apart from the original question).

Would checking the requested amount of memory by malloc(9) against the available RAM (obtained via sysctl hw.physmem) be a good approach to avoid this problem?

To test this, I setup two identical VMs (in fact, VM2 is the copied & renamed version of VM1 bhyve image) with 512MB of RAM, running FreeBSD-CURRENT, single CPU core.

VM1 has the pf.c that comes with the FreeBSD-CURRENT source, whereas VM2 has a patched version of the file, which performs a sanity check on the requested memory by malloc(9) against the RAM size. If the requested memory is more than the available memory, ph_hashsize is set to PF_HASHSIZ, which 32768.

On both VMs, I set net.pf.states_hashsize to 2147483648 (way more than the RAM on the VM) via /boot/loader.conf. pf(4) is loaded with service(8) command. With this configuration on each VM, VM1 fails to start pf(4), whereas VM2 is successful to load the kernel module with net.pf.states_hashsize set to 32768.

VM1
----------------------------------------------------------
vm1 # uname -a
FreeBSD test-pf 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r297692: Fri Apr  8 03:07:13 UTC 2016  root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
vm1 # cat /boot/loader.conf
net.pf.states_hashsize: 2147483648
vm1 # sysctl hw.realmem
hw.realmem: 536870912
vm1 # kldstat
Id Refs Address  Size  Name
1  3 0xffffffff80200000 1ee2bc0  kernel
vm1 # service pf onestart
Enabling pfKernel page fault with the following non-sleepable locks held:
exclusive rw pf rulesets (pf rulesets) r = 0 (0xffffffff822466e0) locked @ /usr/src/sys/modules/pf/../../netpfil/pf/pf_ioctl.c:2901
stack backtrace:
#0 0xffffffff80a91a90 at witness_debugger+0x70
#1 0xffffffff80a92d77 at witness_warn+0x3d7
#2 0xffffffff80e92817 at trap_pfault+0x57
#3 0xffffffff80e91ea4 at trap+0x284
#4 0xffffffff80e71ea7 at calltrap+0x8
#5 0xffffffff8090b646 at devfs_ioctl_f+0x156
#6 0xffffffff80a97106 at kern_ioctl+0x246
#7 0xffffffff80a96e51 at sys_ioctl+0x171
#8 0xffffffff80e92f6b at amd64_syscall+0x2db
#9 0xffffffff80e7218b at Xfast_syscall+0xfb


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code     = supervisor read data, page not present
instruction pointer   = 0x20:0xffffffff8223001f
stack pointer    = 0x28:0xfffffe002b724310
frame pointer    = 0x28:0xfffffe002b724800
code segment     = base rx0, limit 0xfffff, type 0x1b
       = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags   = interrupt enabled, resume, IOPL = 0
current process     = 635 (pfctl)
[ thread pid 635 tid 100074 ]
Stopped at  pfioctl+0x4ef:  movq  (%rdi),%rsi
db> 


VM2
----------------------------------------------------------
vm2 # uname -a
FreeBSD test-pf 11.0-CURRENT FreeBSD 11.0-CURRENT #2: Sun May 29 12:06:57 NZST 2016  test@test-pf:/usr/obj/usr/src/sys/GENERIC  amd64
vm2 # cat /boot/loader.conf
net.pf.states_hashsize="2147483648"
vm2 # sysctl hw.realmem
hw.realmem: 536870912
vm2 # kldstat
Id Refs Address  Size  Name
1  1 0xffffffff80200000 1ee2bc0  kernel
vm2 # service pf onestart
Enabling pf.
vm2 # kldstat
Id Refs Address  Size  Name
1  3 0xffffffff80200000 1ee2bc0  kernel
2  1 0xffffffff82219000 34c30  pf.ko

[1] https://forums.freebsd.org/threads/56250/
Comment 5 Olivier Cochard freebsd_committer 2016-06-03 07:54:33 UTC
Thanks: Your patch works great!
Could you add a log warning message when the not enough memory condition is triggered ?
Comment 6 fehmi noyan isi 2016-08-15 10:25:00 UTC
Hi,

Somehow, I missed the notification for the latest comment! Sorry for the late reply....

It is possible to add a log warning message by using printf(9) or log(9), but I have noticed that pf_initialize() does things quietly most of the times.

Do you reckon using printf(9) or log(9) would be the proper approach for adding a log warning message for the not enough memory condition?
Comment 7 fehmi noyan isi 2016-11-02 08:09:35 UTC
Created attachment 176410 [details]
The latest patch with console and logging facility outputs

Hi there, 

I setup another VM running FreeBSD 12.0-CURRENT and updated the previous patch to reflect Olivier's suggestion with regards to logging.

I used printf(9) to print an error message similar to this

pf: Invalid states hash table size : 2147483648. Using the default value 32768.

Not mentioned in the original bug report, but value of f_srchashsize also needs to be verified against the available memory as setting net.pf.source_nodes_hashsize tunable to a value larger than malloc(9) can allocate causes the system to crash.

The snippet below was captured with script(1) utility from a FreeBSD 12.0-CURRENT system patched with attached diff file (note that console output is not captured, only tty)

VM3
----------------------------------------------------------
Script started on Wed Nov  2 20:07:09 2016
root@vm3:~ # uname -a
FreeBSD vm3 12.0-CURRENT FreeBSD 12.0-CURRENT #2: Tue Nov  1 20:27:46 NZDT 2016     root@vm3:/usr/obj/usr/src/sys/GENERIC  amd64
root@vm3:~ # cat /boot/loader.conf
net.pf.states_hashsize="2147483648"
root@vm3:~ # sysctl hw.realmem
hw.realmem: 536805376
root@vm3:~ # kldstat
Id Refs Address            Size     Name
 1    3 0xffffffff80200000 1fdc408  kernel
 2    1 0xffffffff82419000 2ac85    vboxguest.ko
root@vm3:~ # service pf onestart
Enabling pf.
root@vm3:~ # kldstat
Id Refs Address            Size     Name
 1    5 0xffffffff80200000 1fdc408  kernel
 2    1 0xffffffff82419000 2ac85    vboxguest.ko
 3    1 0xffffffff82444000 351a7    pf.ko
root@vm3:~ # dmesg | grep pf
pf: Invalid states hash table size : 2147483648. Using the default value 32768.
pf: Invalid source node hash table size : 0. Using the default value 8192.
root@vm3:~ # exit
exit

Script done on Wed Nov  2 20:08:23 2016
----------------------------------------------------------