|Summary:||pfctl crash when load pf.conf, libc/resolv problem ?|
|Component:||bin||Assignee:||freebsd-pf (Nobody) <pf>|
|Severity:||Affects Only Me||CC:||fabrice.bruel, kp, olivier|
Description fabrice.bruel 2016-02-10 15:54:20 UTC
Created attachment 166833 [details] pf.conf file Hello I'using FreeBSD 9_STABLE to do firewall with pf. # uname -a FreeBSD FreeBSD 9.3 9.3-STABLE FreeBSD 9.3-STABLE #0 r294729: Tue Jan 26 22:00:32 CET 2016 root@9_STABLE:/usr/obj/usr/src/sys/FBSD9PF amd64 With a specific pf.conf file (join with this message), in some case pftcl -f pf.conf crash with : pfctl: failed to create table __automatic_4130873d_220 in : Cannot allocate memory Segmentation fault: 11 (core dumped) Ok my pf.conf file is bad and not optimize, but syntax is ok. To be sure to reproduce the bug, just do with attach pf.conf : while true;do pftcl -f pf.conf;done and wait a few minutes. I've tried to understand the core file, but I'm a newbie in gdb usage, so I reproduce here what I've done : # gdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". (gdb) core pfctl.core Core was generated by `pfctl'. Program terminated with signal 11, Segmentation fault. #0 0x0000000800cfe6e6 in ?? () (gdb) add-symbol-file /usr/lib/debug/lib/libc.so.7.debug 0x0000000800cfe6e6 add symbol table from file "/usr/lib/debug/lib/libc.so.7.debug" at .text_addr = 0x800cfe6e6 (y or n) y Reading symbols from /usr/lib/debug/lib/libc.so.7.debug...done. (gdb) bt #0 0x0000000800cfe6e6 in .text () #1 0x0000000000000001 in ?? () #2 0x0000000000639668 in ?? () #3 0x00007fffffffd870 in ?? () #4 0x0000000801400000 in ?? () #5 0x0000000800000001 in ?? () #6 0x00000008018009d0 in ?? () #7 0x00000000ffffffff in ?? () #8 0x00000008014045d0 in ?? () #9 0x00000000ffffffff in ?? () #10 0x0000000801402ad0 in ?? () #11 0x00000008ffffffff in ?? () #12 0x00000008014024d0 in ?? () #13 0x00000008ffffffff in ?? () #14 0x00000008014021d0 in ?? () #15 0x00000000ffffffff in ?? () #16 0x0000000801401ed0 in ?? () #17 0x00007fffffffffff in ?? () #18 0x0000000801401a50 in ?? () #19 0x0000000800000001 in ?? () #20 0x0000000801401a50 in ?? () #21 0x0000000000000017 in ?? () #22 0x00007fffffffd5e0 in ?? () #23 0x0000000800d6dc29 in __printf_render_int (io=0x7, pi=0x6394b0, arg=<value optimized out>) at /usr/src/lib/libc/stdio/xprintf_int.c:422 #24 0x0000000800faab40 in ?? () #25 0x00007fffffffd33b in ?? () #26 0x0000000800d06eca in files_rpcent (retval=0x800cfc36f, mdata=<value optimized out>, ap=<value optimized out>) at /usr/src/lib/libc/rpc/getrpcent.c:317 #27 0x65726168732f6c61 in ?? () #28 0x62696c2f736c6e2f in ?? () #29 0x0074616300432f63 in ?? () #30 0x00007fffffffd400 in ?? () #31 0x0000000800652c00 in ?? () #32 0x00007fffffffd410 in ?? () #33 0x00007fffffffd3b0 in ?? () #34 0x0000000000000000 in ?? () (gdb) add-symbol-file /usr/lib/debug/lib/libc.so.7.debug 0x00007fffffffd3b0 add symbol table from file "/usr/lib/debug/lib/libc.so.7.debug" at .text_addr = 0x7fffffffd3b0 (y or n) y Reading symbols from /usr/lib/debug/lib/libc.so.7.debug...done. (gdb) bt #0 0x0000000800cfe6e6 in .text () #1 0x0000000000000001 in ?? () #2 0x0000000000639668 in ?? () #3 0x00007fffffffd870 in wcsxfrm_l (dest=0x7fffffffd0b0, src=0x7fffffffd0d0, len=6526232, locale=<value optimized out>) at /usr/src/lib/libc/string/wcsxfrm.c:126 #4 0x0000000000000002 in ?? () #5 0x0000000000000002 in ?? () #6 0x0000000800faab40 in ?? () #7 0x0000000800faab40 in ?? () #8 0x0000000800faab40 in ?? () #9 0x00007fffffffd33b in ?? () #10 0x0000000800d06eca in files_rpcent (retval=0x800d06eca, mdata=<value optimized out>, ap=<value optimized out>) at /usr/src/lib/libc/rpc/getrpcent.c:317 #11 0x0000000800d83e3e in __res_pquery (statp=0x7fffffffd320, msg=<value optimized out>, len=<value optimized out>, file=0x800cfc11a) at /usr/src/lib/libc/resolv/res_debug.c:305 #12 0x0000000000000000 in ?? () (gdb) If my use of gdb is correct, it seems to be a problem in /usr/src/lib/libc/resolv/res_debug.c ? I can send the core file but 14Mo ... Thanks for your help Fabrice
Comment 1 Kristof Provost 2016-02-12 13:05:11 UTC
I've had a quick look at this, and I think there are two problems. The first is 'pfctl: failed to create table __automatic_4130873d_220 in : Cannot allocate memory'. For some reason the kernel is unable to create this table. That might be simple memory pressure (i.e. a combination of memory use and memory fragmentation). The second is the crash of pfctl. That looks like heap corruption as a result of incorrect handling of the error from the kernel. For that one rebuilding world with 'DEBUG_FLAGS=-g' and running pfctl in valgrind is quite useful. I've had a quick test on 10 as well, and I've been unable to reproduce the problem there.
Comment 3 fabrice.bruel 2016-02-15 09:25:26 UTC
Hello, I've recompiled the world with DEBUG_FLAGS=-g in /etc/make.conf. So I run pfctl with my special pf.conf in valgrind, find in the attached file the ouptut (valgrind.output) Just for information, I used PF compiled in the kernel with the following options : # les options de pf device pf device pflog device pfsync # altq(9). Enable the base part of the hooks with the ALTQ option. # Individual disciplines must be built into the base system and can not be # loaded as modules at this point. In order to build a SMP kernel you must # also have the ALTQ_NOPCC option. options ALTQ options ALTQ_CBQ # Class Bases Queueing options ALTQ_RED # Random Early Drop options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler options ALTQ_CDNR # Traffic conditioner options ALTQ_PRIQ # Priority Queueing options ALTQ_NOPCC # Required for SMP build options ALTQ_DEBUG Thanks Fabrice
Comment 4 Kristof Provost 2016-02-15 09:41:30 UTC
Yeah, so this: ==17184== by 0x404B46: pfctl_rules (pfctl.c:1486) ==17184== by 0x406DA7: main (pfctl.c:2378) ==17184== Address 0x6aa8a08 is 56 bytes inside a block of size 64 free'd ==17184== at 0x4C1E2DC: free (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so) ==17184== by 0x4210A0: superblock_free (pfctl_optimize.c:1640) ==17184== by 0x4233BE: pfctl_optimize_ruleset (pfctl_optimize.c:357) ==17184== by 0x40453B: pfctl_load_ruleset (pfctl.c:1297) ==17184== by 0x404B46: pfctl_rules (pfctl.c:1486) ==17184== by 0x406DA7: main (pfctl.c:2378) Is likely the reason your pfctl segfaults. There's a use after free. It's not the direct cause though, that's the kernel rejecting your rules. Would it be possible to upgrade the machine to stable/10? It looks like the problem is fixed there.
Comment 5 fabrice.bruel 2016-02-17 10:19:53 UTC
Hello, This dirty pf.conf load in a loop during last 24h on FreeBSD 10_STABLE without problem. So, I think, I need to migrate ... Thanks for your help Fabrice
Comment 6 fabrice.bruel 2016-08-26 09:02:15 UTC
Hello, I was too hasty: the problem has not disappeared in 10 Stable but is less easy to reproduce. Actually, pfctl doesn't crash directly. But it can used all of the CPU load. I'm also using the same dirty pf.conf. I join the new valgrind output on : # uname -a FreeBSD FBSD10STABLE 10.3-STABLE FreeBSD 10.3-STABLE #2 r304805: Thu Aug 25 16:38:19 CEST 2016 root@FBSD10STABLE:/usr/obj/usr/src/sys/FBSD10PF amd64 Thanks for your help Fabrice
Comment 7 fabrice.bruel 2016-08-26 09:06:22 UTC
Created attachment 174090 [details] Valgrind output in 10.3-STABLE
Comment 8 fabrice.bruel 2016-08-26 09:13:42 UTC
Sorry, I've forgot DEBUG_FLAGS, the new valgrind ouputin a few minutes !
Comment 9 fabrice.bruel 2016-08-26 13:48:32 UTC
Created attachment 174097 [details] Valgrind output in 10.3-STABLE with debug
Comment 10 Kristof Provost 2016-08-28 16:47:39 UTC
Valgrind is not really producing anything useful here. It's be interesting to see what pfctl is doing when it gets stuck using a lot of CPU time. Did truss show anything interesting?
Comment 11 fabrice.bruel 2016-08-29 14:46:46 UTC
Hello, Ok, if I run truss pfctl.conf.anon, the output seems to be normal for me newbie level. Si in a first time, I run a script that call a lot of pfctl and I have a pfctl that burn cpu. In a second time I run again truss pfctl.conf.anon I join the output here Hth Thanks Fabrice