Bug 191359

Summary: [memguard] [panic] Memory modified after free w/MEMGUARD build
Product: Base System Reporter: Peter Holm <pho>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Some People CC: benno, emaste, luke.tw, markj, ngie, op, svmhdvn
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Peter Holm freebsd_committer freebsd_triage 2014-06-25 09:28:13 UTC
db:0:pho> bt
Tracing pid 785 tid 100153 td 0xfffff8002a52c490
uma_find_refcnt() at uma_find_refcnt+0x33/frame 0xfffffe17288d2590
mb_ctor_clust() at mb_ctor_clust+0x8f/frame 0xfffffe17288d25c0
uma_zalloc_arg() at uma_zalloc_arg+0x164/frame 0xfffffe17288d2660
m_getjcl() at m_getjcl+0xa3/frame 0xfffffe17288d26b0
m_getm2() at m_getm2+0xe7/frame 0xfffffe17288d2700
m_uiotombuf() at m_uiotombuf+0xa4/frame 0xfffffe17288d2770
sosend_generic() at sosend_generic+0x6cc/frame 0xfffffe17288d2820
sosend() at sosend+0x5d/frame 0xfffffe17288d2880
soo_write() at soo_write+0x42/frame 0xfffffe17288d28b0
dofilewrite() at dofilewrite+0x88/frame 0xfffffe17288d2900
kern_writev() at kern_writev+0x68/frame 0xfffffe17288d2950
sys_write() at sys_write+0x63/frame 0xfffffe17288d29a0
amd64_syscall() at amd64_syscall+0x278/frame 0xfffffe17288d2ab0

How to repeat:
sysctl vm.memguard.options=3; sysctl vm.memguard.desc=allocdirect +
ssh activity

Details: http://people.freebsd.org/~pho/stress/log/memguard4.txt
Comment 1 luke.tw 2015-01-07 13:44:34 UTC
Dear Peter, 

I managed to find the root cause.
The bug can be reproduced by setting "sysctl vm.memguard.options=2" and ssh activity
   1. memguard.options = 2 enable memguard to protect all allocations that are bigger than PAGE_SIZE.
   2. ssh activity allocates mbuf that uses zone with UMA_ZONE_REFCNT flag. The zone is protected by memguard.
        
However, these two features save values in the same union plinks in vm_page
   1. memguard save allocation size in vm_page->plinks.memguard.v
   2. UMA_ZONE_REFCNT save refcount in vm_page->plinks.s.pv
        
The following patch can work around this bug.

Index: sys/vm/memguard.c
===================================================================
--- sys/vm/memguard.c   (revision 276729)
+++ sys/vm/memguard.c   (working copy)
@@ -506,6 +506,9 @@
            zone->uz_flags & UMA_ZONE_NOFREE)
                return (0);

+       if (zone->uz_flags & UMA_ZONE_REFCNT)
+               return (0);
+
        if (memguard_cmp(zone->uz_size))
                return (1);
Comment 2 Enji Cooper freebsd_committer freebsd_triage 2015-01-12 19:18:29 UTC
pho reported that this change worked for him when running stress2. Either benno or I can take this bug and commit it to head.

Thank you very much for the patch!
Comment 3 luke.tw 2015-01-13 01:03:29 UTC
Great! Thanks for your effort.
Comment 4 Peter Holm freebsd_committer freebsd_triage 2015-01-22 13:58:12 UTC
Did some more testing with your patch and found that setting vm.memguard.frequency=1000 triggers a suspicious amount of different panics. For example http://people.freebsd.org/~pho/stress/log/memguard.frequency.txt
Comment 5 Enji Cooper freebsd_committer freebsd_triage 2015-02-17 09:50:55 UTC
CR: https://reviews.freebsd.org/D1865
Comment 6 Peter Holm freebsd_committer freebsd_triage 2015-02-17 11:57:09 UTC
Unfortunately a new test of this patch show the same problem as before:

root@t1:~ # sysctl vm.memguard.options=3; sysctl vm.memguard.desc=allocdirect
vm.memguard.options: 1 -> 3
vm.memguard.desc:  -> allocdirect
root@t1:~ # ssh pho@localhost
Memory modified after free 0xfffffe0000411000(4096) val=0 @ 0xfffffe0000411000


Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address   = 0x3000
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80bf0053
stack pointer           = 0x28:0xfffffe17287981b0
frame pointer           = 0x28:0xfffffe1728798200
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 922 (sshd)
[ thread pid 922 tid 100205 ]
Stopped at      uma_find_refcnt+0x33:   movq    (%rax),%rax
db> x/s version
version:        FreeBSD 11.0-CURRENT #3 r278882M: Tue Feb 17 12:42:28 CET 2015\012    pho@t1.osted.lan:/usr/src/sys/amd64/compile/MEMGUARD\012
db>
Comment 7 Enji Cooper freebsd_committer freebsd_triage 2015-04-14 21:58:30 UTC
Passing bug I'm not actively working on back to the general pool.
Comment 8 Siva Mahadevan 2017-08-28 20:18:00 UTC
(In reply to Peter Holm from comment #6)
I'm not able to reproduce this bug on FreeBSD 12 CURRENT using your steps. Are there any other configuration steps you've taken before running those commands? It could be that this bug has been fixed in CURRENT.

uname output: FreeBSD 12.0-CURRENT 2621be48c91(master): Mon Aug 28 14:45:10 EDT 2017
Comment 9 Peter Holm freebsd_committer freebsd_triage 2017-08-31 08:14:59 UTC
(In reply to Siva Mahadevan from comment #8)
Hard for me to say it the original panic is still there.
With the same scenario I see:

panic: MemGuard detected double-free of 0xfffffe000075e000
cpuid = 2
time = 1504166229
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe2ebbde5db0
vpanic() at vpanic+0x19c/frame 0xfffffe2ebbde5e30
panic() at panic+0x43/frame 0xfffffe2ebbde5e90
memguard_free() at memguard_free+0x14f/frame 0xfffffe2ebbde5ed0
bufkva_free() at bufkva_free+0xf8/frame 0xfffffe2ebbde5ef0
buf_free() at buf_free+0xd5/frame 0xfffffe2ebbde5f40
brelse() at brelse+0x5c0/frame 0xfffffe2ebbde5fd0
bufdone_finish() at bufdone_finish+0xd4/frame 0xfffffe2ebbde5ff0
bufdone() at bufdone+0xe3/frame 0xfffffe2ebbde6020
biodone() at biodone+0x188/frame 0xfffffe2ebbde6060
g_io_deliver() at g_io_deliver+0x5e4/frame 0xfffffe2ebbde6140
biodone() at biodone+0x188/frame 0xfffffe2ebbde6180
g_io_deliver() at g_io_deliver+0x5e4/frame 0xfffffe2ebbde6260
biodone() at biodone+0x188/frame 0xfffffe2ebbde62a0
g_io_deliver() at g_io_deliver+0x5e4/frame 0xfffffe2ebbde6380
g_disk_done() at g_disk_done+0x1ee/frame 0xfffffe2ebbde6400
biodone() at biodone+0x188/frame 0xfffffe2ebbde6440
dadone() at dadone+0x194b/frame 0xfffffe2ebbde69a0
xpt_done_process() at xpt_done_process+0x35f/frame 0xfffffe2ebbde69e0
xpt_done_td() at xpt_done_td+0x136/frame 0xfffffe2ebbde6a30
fork_exit() at fork_exit+0x13b/frame 0xfffffe2ebbde6ab0

Details @ https://people.freebsd.org/~pho/stress/log/memguard8.txt
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2021-04-01 15:58:51 UTC
There was a problem at one point with guarding mbufs using memguard, should be fixed by https://cgit.freebsd.org/src/commit/?id=bc9d08e1cfe381f67fea89eff8f6235a15022494

I'm not sure what's going on in comment 9.  This looks a bit like a bug in memguard itself.  Does it still occur on a recent head?
Comment 11 Peter Holm freebsd_committer freebsd_triage 2021-04-01 18:30:13 UTC
No problems seen with this test scenario on main-n245775-9aef4e7c2bd.